ID  A1 091 41 


FINAL 


FINAL  REPORT:  EFFICIENT  METHODS  OF  ESTIMATING 
THE  OPERATING  CHARACTERISTICS  OF  ITEM  RESPONSE 
CATEGORIES  AND  CHALLENGE  TO  A  NEW  MODEL  FOR 
THE  MULTIPLE-CHOICE  ITEM 


DEPARTMENT  OF  PSYCHOLOGY 
UNIVERSITY  OF  TENNESSEE 
KNOXVILLE,  TENN.  37996-0900 


NOVEMBER,  1981 


! 


Prepared  under  the  contract  number  N00014*77-C-0*t60, 
NR  150-402  with  the 

Personnel  end  Training  Research  Program* 
Psychological  Sciences  Dhrteioo 
Office  of  Nasal  Research 


tpprosed  for  public  release;  distribution  ualbbitad. 
Reproduction  hi  whole  or  In  part  la  puoRtad  foe 
any  purpose  of  the  United  States  r - ‘ 


•1  12  31  003 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


Unclassified _ 

itcunnv  CLASSIFICATION  OF  THIS  PAGE  ftFhwi  Dmli>  Cnttrtd) 


REPORT  DOCUMENTATION  PAGE 


report  number 

Final  Report 


4  T|TLEf'ndSub""'>  Final  Report:  Efficient  Method  *'  ™pe  of  report  a  period  covered 

of  Estimating  the  Operating  Characteristics  of  Technical  Report 

Item  Response  Categories  and  Challenge  to  a  New 
Model  for  the  Multiple-Choice  Item 


«.  PERFORMING  ORG.  REPORT  NUMBER 


7.  auThori'j; 

Dr.  Fumiko  Samejima 


9  PERFORMING  ORGANIZATION  NAME  AND  ADDRESS 

Department  of  Psychology 
University  of  Tennessee 
Knoxville,  Tennessee  37916 


II  CON  TROLLING  OFFICE  NAME  AND  ADDRESS 

Personnel  and  Training  Research  Programs 
Office  of  Naval  Research 
Arlington,  Virginia  22217 


».  CONTRACT  OR  GRANT  NUMRERf*; 

N00014-77-C-0360 


ti.  REPORT  DATE 


13.  NUMBER  OF  PAGES 

243 


M  MONITORING  AGENCY  NAME  A  ADORESSflf  dllloronl  from  Controlling  O  lllco)  IS.  SECURITY  CLASS,  (ot  I  hit  roport) 

Unclassified 

/  _  , 

I  io.  DECL  ASSI  FIC  ATI  ON/  DOWNGRADING 
SCHEDULE 


is  distribution  statement  (ot  ihi,  report) 

Approved  for  public  release;  distribution  unlimited.  Reproduction  in 
whole  or  in  part  is  permitted  for  any  purpose  of  the  United  States 
government . 


17.  distribution  STATEMENT  (ol  Iho  *6 mlrmct  ontorod  In  Block  20,  II  ditto ronl  Irom  ftopor  1) 


oaH, 


19  KEY  WORDS  (Confinu#  on  fvtte  #/cf#  ll  n#c oosory  mnd  Identity  by  block  number) 

t,v 

Operating  Characteristic  Estimation 

Tailored  Testing 

Latent  Trait  Theory 

i 

i 

! 

i  20.  ABSTRACT  (*  Confirm#  on  r*v#r##  eld*  li  nvctftafy  mm%d  identity  by  block  nwnbvr) 

,3 

(Please  see  reverse  side) 

,1 

! 

i 

1 

OD  ;  jaU*71  1473  EOITION  OF  1  NOV  «»  IS  OBSOLETE 

:■  M  010?-  LF.  01  i-  ot, 0  I 


Unclassified _ 

S E CURITY  CLASSIFICATION  of  THU  PAGE  (Wkon  Doit  gull 


Unclassified 


SECURITY  CLASSIFICATION  OF  THIS  RAGE  f»hM  DmU  Entered) 


SECURITY  CLASSIFICA'tes  OF  TM!S  RAGEflWtMt  Del*  Fntereti' 


FINAL  REPORT:  EFFICIENT  METHODS  OF  ESTIMATING  THE  OPERATING 
CHARACTERISTICS  OF  ITEM  RESPONSE  CATEGORIES  AND  CHALLENGE 
TO  A  NEW  MODEL  FOR  THE  MULTIPLE-CHOICE  ITEM 

ABSTRACT 

This  is  the  final  report  for  the  research  contract 
N00014-77-C-0360,  which  started  on  May  1,  1977,  and  ended  on 
September  30,  1981.  It  systematizes  major  findings  of  the 
research,  and  gives  perspectives  and  directions  of  future 
research. 


This  final  report  has  been  completed  with  the  dedicated 
assistantship  by  Paul  S.  Changas,  Charles  McCarter, 

Shiao-Yung  Chen,  Zailan  Bte  Mohamed ,  and  Pat  Palmer. 

In  addition  to  the  above  five,  more  than  twenty  people  have 
come  to  the  principal  investigator's  laboratory  to  give  her  help  at 
one  time  or  another  during  the  past  four  years  and  five  months.  The 
principal  investigator  would  like  to  thank  to  all  of  them.  Among 
others,  Paul  S.  Changas,  C.  I.  Bonnie  Chen,  Philip  S.  Livingston, 
Charles  T.  McCarter,  Melanie  Perkins,  Robert  L.  Trestman  and 
Ching-Chan  Yeh  have  special  thanks  from  the  author. 


T 

:i 

t 

4.. 


’  *  . .  * 


TABLE  OF  CONTENTS 


’’'I 

I 

V 


Page 

Preface  1 

I  General  Background  3 

II  Research  Reports  6 

III  Estimation  of  the  Operating  Characteristics  of 

the  Discrete  Item  Responses  and  That  of  Ability 
Distributions:  I  9 

(111.1)  Relationship  between  the  Estimation  of 
the  Operating  Characteristics  and  that 

of  Ability  Distributions  9 

(111. 2)  No  Mathematical  Forms  Are  Assumed  for 
the  Operating  Characteristics  of  the 

Unknown  Test  Items  11 

(111. 3)  Small  Number  of  Examinees  in  the 

Calibration  Data  x  12 

(111. 4)  Old  Test  13 

(111. 5)  Set  of  Five  Hundred  Maximum  Likelihood 

Estimates  17 

(111. 6)  Unknown  Test  Items  Whose  Operating 

Characteristics  Are  to  Be  Estimated  18 

(111. 7)  Use  cf  Robust,  Indirect  Information  19 

(111. 8)  Transformation  of  Ability  6  to  t  23 

IV  Method  of  Moments  As  the  Least  Squares  Solution 

for  Fitting  a  Polynomial  28 

(IV. 1)  Approximation  to  the  Density  Function 

from  a  Set  of  Observations  28 

(IV, 2)  Method  of  Moments  As  the  Least  Squares 

Solution  for  Fitting  a  Polynomial  32 

(IV. 3)  Direct  Use  of  the  Least  Squares  Solution  34 

(IV. 4)  Solution  by  the  Method  of  Moments  35 

(IV. 5)  Expanded  Use  of  the  Method  of  Moments  39 

(IV. 6)  Selection  of  the  Interval  40 

(IV. 7)  Comparison  of  the  Results  Obtained  by 

the  Method  of  Moments  and  by  the  Direct 

Least  Squares  Procedure  43 


V  Estimation  of  the  Operating  Characteristics  of 
the  Discrete  Item  Responses  and  That  of  Ability 
Distributions:  II 

(V.l)  Estimated  Operating  Characteristics  Which 
Are  Directly  Observable  from  Our 
Calibration  Data 

(V.2)  Necessary  Correction  for  the  Scale  of  the 
Maximum  Likelihood  Estimate  When  Used  As  a 
Substitute  for  the  Ability  Scale 

(V.3)  Transformation  of  0  to  r  Using  the 

Method  of  Moments  for  Fitting  a  Polynomial 

(V.4)  Classification  of  Methods  and  Approaches 

(V. 5)  Normal  Approximation  Method 

(V.6)  Approximation  to  the  Density  Function  of 
the  Maximum  Likelihood  Estimate  by  a 
Polynomial  Obtained  by  the  Method  of 
Moment  s 

(V.7)  Pearson  System  Method 

(V.8)  Two-Parameter  Beta  Method 

(V.9)  Normal  Approach  Method 

(V. 10)  Bivariate  P.D.F.  Approach 

(V. 11)  Histogram  Ratio  Approach 

(V. 12)  Curve  Fitting  Approach 

(V.13)  Conditional  P.D.F.  Approach 

(V.14)  Remark  on  the  Approximation  of  4> (x  ] x ) 
by  a  Normal  Density  Function 

VI  Estimation  of  the  Operating  Characteristics  of 
the  Discrete  Item  Responses  and  That  of  Ability 
Distributions:  III 

(VI. 1)  Objective  Testing  and  Exchangeability 

(VI. 2)  Every  Test  Has  a  Limitation 

(VI. 3)  Alternative  Estimators  for  the  Maximum 
Likelihood  Estimator 

(VI. 4)  Bayes  Estimator  with  a  Uniform  Density 
as  the  Prior 

(VI. 5)  Subtest  3 

(VI. 6)  Nine  Subtests  As  Our  Old  Test 

(VI. 7)  Sample  Linear  Regression  of  x  on  x 

8  S 

(VI. 8)  Polynomial  Approximation  to  the  Density 
Function,  g(x) 


46 

46 

47 

50 

53 

54 


56 

59 

61 

64 

64 

69 

72 

73 

89 


92 

92 

97 

98 

106 

108 

114 

116 


120 


120 


(VI. 9)  Estimated  Item  Characteristic  Functions 
Obtained  upon  Subtests  1,  2  and  3 

(VI. 10)  Estimated  Item  Characteristic  Functions 
Obtained  upon  the  Six  Other  Subtests 

VII  Adaptive  Testing 

(VII, 1)  Addition  of  New  Test  Items  to  the  Item 
Pool 

(Vll.2)  Weakly  Parallel  Tests 

(VII. 3)  Use  of  the  Amount  of  Test  Information  as 
the  Criterion  for  Terminating  the 
Presentation  of  New  Test  Items 

(VII. 4)  Test  Information  Function  and  Standard 
Error  of  Estimation 

(VII. 5)  Old  Test  for  Item  Calibration 

(VII. 6)  Adaptive  Testing  Using  Graded  Test  Items 

(VII. 7)  Bayesian  vs.  Maximum  Likelihood 
Estimation  in  Adaptive  Testing 

VIII  Constant  Information  Model 

(VIII. 1)  Constancy  of  Information  under  the 

Transformation  of  the  Latent  Trait 

* 

(VIII. 2)  Constancy  of  Item  Information  for  a 
Specified  Model 

(VIII. 3)  Constancy  of  Item  Information  for  a  Set 
of  Models 

(VIII. 4)  Exact  Area  under  the  Square  Root  of  the 
Item  Information  Function 

(VIII. 5)  Constant  Information  Model 

(VIII. 6)  Use  of  Constant  Information  Model  for  a 
Set  of  Equivalent  Test  Items  Which 
Substitutes  for  the  Old  Test 

(VIII. 7)  How  to  Detect  a  Subset  of  Equivalent 
Binary  Items 

(VIII. 8)  Convergence  of  the  Conditional 

Distribution  of  the  Maximum  Likelihood 
Estimate  to  the  Asymptotic  Normality 
When  a  Test  Consists  of  Equivalent  Items 

IX  A  New  Family  of  Models  for  the  Multiple-Choice  Test 
Item:  I 

(IX. 1)  Mathematical  Models  and  Psychological 
Reality 


129 

137 

137 

138 

139 

142 

143 

144 

147 

152 

152 

153 

156 

157 

158 


162 


163 


165 

173 


173 


(IX. 2)  Three  Parameter  Logistic  Model  174 

(IX. 3)  Tokyo  Research  175 

(IX. 4)  Sato’s  Index  k  175 

(IX. 5)  Index  k*  for  the  Validation  Study  of 

the  Three-Parameter  Logistic.  Model  179 

(IX. 6)  Simulation  Study  on  Index  k*  181 

(IX. 7)  Iowa  Tests  of  Basic  Skills  184 

(IX. 8)  Original  and  Revised  Iowa  Data  186 

(IX. 9)  Informative  Distractor  Model  189 

(IX. 10)  Equivalent  Distractor  Model  191 

(IX. 11)  Index  k*  for  the  Invalidation  of  the 

Equivalent  Distractor  Model  192 

(IX. 12)  Results  Obtained  by  Using  Index  k*  on 

Iowa  Data  193 

(IX. 13)  Comparison  of  the  Results  on  Common  Test 
Items  for  Three  Levels  of  Examinees  in 
Iowa  Study  197 

(IX. 14)  Remarks  on  the  Usage  of  Index  k*  201 

X  A  New  Family  of  Models  for  the  Multiple-Choice 

Test  Item:  II  203 

(X. 1)  Shiba’s  Word  Comprehension  Tests  203 

(X. 2)  Subjects  Used  in  Shiba's  Research  204 

(X.3)  Methods  and  Results  of  Shiba's  Research  204 

(X.4)  Distractors  As  Resources  of  Information  210 

(X.5)  Mathematical  Models  in  Physics  and  In 

Psychology  210 

(X. 6)  Normal  Ogive  Model  on  the  Graded  Response 

Level  and  Bock’s  Multinomial  Model  211 

(X.7)  A  New  Family  of  Models  for  the  Multiple- 

Choice  Test  Items  217 

(X.8)  Basic  Functions  and  Information  Functions 

of  the  New  Models  222 

(X.9)  Instructions  and  Mathematical  Models  230 

(X.10)  A  New  Approach  to  Data  Analysis  231 


XI  Conclusions 


236 


PREFACE 


Four  years  and  five  months  have  passed  since  I  started  this 
research  on  May  1,  1977,  and  these  were  hectic  years.  During  this 
period,  so  many  things  were  designed  and  accomplished.  Even  if  I 
am  the  principal  investigator,  I  find  it  practically  impossible  to 
include  and  systematize  all  the  important  findings  and  implications 
within  a  single  final  report.  I  did  my  best  within  a  limited  amount 
of  time,  however.  It  is  obvious  that  the  present  report  should  be 
supplemented  and  revised  further.  I  plan  to  do  so  and  use  the 
result  at  the  Advanced  Seminar  on  Latent  Trait  Theory,  which  will 
be  held  in  spring,  1982,  in  the  vicinity  of  Knoxville,  Tennessee, 
under  the  sponsorship  of  the  Office  of  Naval  Research. 

There  were  four  objectives  in  the  original  research  proposal, 
and  they  can  be  summarized  as  follows. 

[1]  Investigation  of  theory  and  method  for  estimating  the 
operating  characteristics  of  discrete  item  responses, 
without  assuming  any  specific  mathematical  forms,  and 
without  using  too  many  examinees  in  the  whole  procedure. 

[2]  Investigation  of  the  speed  factor  working  in  combination 
with  the  power  factor  in  intellectual  performance. 

[3]  Investigation  of  the  random  guessing  behavior  in  testing, 
and  the  development  of  a  new  model ,  or  new  models ,  for 
the  multiple-choice  item. 

[4]  Investigation  of  efficient  methods  of  estimating  the  ability 
distribution  for  any  specific  group  of  examinees . 

Out  of  these  four  objectives.  Objective  [1],  together  with  Objective 
[4],  was  very  intensively  pursued.  The  highest  productivity  belongs 
to  this  part  of  the  research.  Objective  [3]  was  also  successfully 
pursued.  It  provided  us  with  valuable  future  perspectives  and 


directions  of  research.  In  contrast  to  these  three.  Objective  [2] 
was  more  or  less  dropped.  To  compensate  for  it,  however,  there  were 
several  other  topics  pursued,  such  as  a  new  mathematical  model  for 
the  binary  item  called  Constant  Information  Model,  the  method  of 
moments  as  the  least  squares  solution  for  fitting  a  polynomial, 
Bayesian  estimation  of  ability,  and  alternative  estimators  for  the 
maximum  likelihood  estimator  for  the  two  extreme  response  patterns. 
All  of  these  additional  topics  are  related  to  the  proposed 
objectives,  but  they  also  have  the  values  of  their  own. 

Recently,  some  researchers  have  started  using  the  title. 

Item  Response  Theory,  instead  of  Latent  Trait  Theory,  the  former 
of  which,  I  believe,  was  first  proposed  by  Dr.  Frederic  M.  Lord. 
Although  I  have  a  great  deal  of  respect  for  Dr.  Lord  for  his  long, 
brilliant  career  as  a  researcher  and  scholar,  I  prefer  Latent  Trait 
Theory.  One  of  the  reasons  for  my  preferenpe  is  that  I  see  no 
reason  why  it  should  be  changed,  after  so  many  years  of  presentations 
and  publications  of  papers  under  the  title  of  Latent  Trait  Theory, 
which  include  my  own  paper  presented  at  the  Fifth  International 
Symposium  on  Multivariate  Analysis,  and  published  in  Multivariate 
Analysis  V  (Krishnaiah,  Ed.,  1978)  as  a  chapter.  I  feel  that  the 
change  of  the  title  would  cause  more  confusion  than  anything  else, 
not  only  among  psychologists  but  also  among  mathematicians  and 
mathematical  statisticians  who  have  become  familiar  with  the  Theory. 
Secondly,  the  term.  Item  Response  Theory,  has  been  used  mainly  by 
researchers  whose  interest  is  in  the  three-parameter  logistic  model 
in  the  uni-dlmensional  latent  space.  For  the  type  of  research  such 
as  mine,  which  covers  broader  areas  and  even  includes  the  multi¬ 
dimensional  latent  space,  Latent  Trait  Theory  sounds  more  appropriate. 
In  the  present  report,  therefore.  Latent  Trait  Theory  is  exclusively 
used  for  the  general  title,  instead  of  Item  Response  Theory. 

September  30,  1981 


Author 


-3- 


1-1 


I  General  Background 

Latent  Trait  Theory  can  be  traced  back  to  the  nineteen-forties,  in 
the  work  of  Lawley  (Lawley,  1943)  and  others.  In  the  nineteen-fifties, 
psychcmetricians  like  Tucker  and  Lord  developed  the  basic  theory  as  a 
mental  test  theory,  and,  among  others,  Lord  integrated  and  published  it 
in  a  Psychometric  Monograph  (Lord,  1952).  These  early  works  by 
ps.'f  bjmetricians  were  joined  by  the  latent  structure  analysis,  which  had 
been  developed  by  Lazarsfeld  (Lazarsfeld,  1959)  and  others  as  a  theory 
of  social  attitude  measurement  in  the  area  of  sociology,  and  also  by  the 
work  accomplished  by  Rasch  (Rasch,  1960)  in  the  context  of  mental 

.‘i 

measurement.  These  pioneer  works  led  us  to  a  comprehensive  system  of 
the  Latent  Trait  Theory. 

The  modern  mental  test  theory  thus  established  originally  adopted 
the  normal  ogive  model  for  the  conditional  probability  of  the  correct 
answer,  given  ability,  or  the  item  characteristic  function,  of  the 
dichotomously  scored  test  item.  In  the  nineteen-sixties,  Birnbaum  (Birnbaum, 
1968)  proposed  the  logistic  model,  which  is  an  approximation  to  the 
normal  ogive  model  with  its  benefit  of  mathematical  simplicities  caused 
by  a  simple  sufficient  statistic  for  the  vector  of  binary  item  scores, 
or  the  response  pattern.  Birnbaum  also  proposed  the  three-parameter 
logistic  model  for  the  multiple-choice  test  item,  which  is  a  modification 
of  the  logistic  model  and  is  based  upon  the  knowledge  or  random  guessing 
principle.  Samejima  (Samejima,  1969)  expanded  the  theory  to  include 
both  the  nominal  and  graded  response  levels ,  in  addition  to  the 
dichotomous  response  level.  The  graded  response  level  assumes  integers, 

0  through  m  (>  1)  ,  for  the  item  score,  and  is  further  classified 
8 

into  two  cases,  the  homogeneous  case  and  the  heterogeneous  case 
(Samejima,  1972).  With  this  generalization,  we  needed  more  than  a  single 
item  characteristic  function  for  a  test  item,  and  the  conditional 
probability,  given  ability,  or  the  operating  characteristic,  of  each  of 
the  discrete  responses  to  an  item  was  introduced.  Both  the  normal  ogive 
model  and  the  logistic  model  were  expanded  for  the  homogeneous  case  of 
the  graded  response  level,  which  provide  us  with  ordered,  unimodal 
operating  characteristics  for  all  the  intermediate  response  categories. 


-4- 


1-2 


Sufficient  conditions  for  a  model  to  have  a  unique  maximum  of  the 
operating  characteristic  of  each  and  every  response  pattern  were 
investigated  and  postulated.  Bock  (Bock,  1972)  proposed  a  multinomial 
response  model,  which  can  either  be  interpreted  as  a  model  on  the  nominal 
response  level  or  as  a  model  in  the  heterogeneous  case  of  the  graded 
response  level.  Samejima  (Samejima,  1973)  also  proposed  several  models 
on  the  continuous  response  level ,  def ining  the  operating  density 
characteristic  for  each  continuous  item  response,  and,  later  (Samejima, 
1974),  she  expanded  it  to  the  multi-dimensional  latent  space. 

In  contrast  to  the  development  of  the  theory,  its  applications 
are  still  far  behind.  For  one  thing,  the  theory  has  not  been  well 
understood  and  used  by  most  applied  researchers.  Many  psychologists 
still  bury  themselves  in  the  tautology  of  the  classical  mental  test 
theory,  although  it  has  been  pointed  out  (Samejima,  1977)  that  ouch  core 
concepts  in  classical  test  theory  as  the  reliability  coefficient  and  the 
validity  coefficient  of  a  test  are  highly  irrelevant  and  misleading,  anu 
that  the  information  functions  in  Latent  Trait  Theory  provide  us  with  a 
far  more  relevant  set  of  information. 

In  the  past  decade,  Rasch  model  has  become  increasingly  popular 
among  certain  applied  researchers.  The  development  of  adaptive  testing, 
or  tailored  testing,  has  also  made  the  three-parameter  logistic  model 
popular  among  researchers  of  mental  measurement.  The  gradual 
popularities  of  these  two  models  do  not  always  depend  upon  the  relevance 
of  these  models,  however.  Researchers  tend  to  choose  one  of  those 
models  fairly  arbitrarily,  and  because  of  its  availability  and  easiness 
in  handling  rather  than  their  scientific  convictions.  The  worst  of  all, 
very  little  effort  has  been  put  upon  the  model  validation,  which  is 
essential  in  any  scientific  research. 

TM  orientation  we  aim  at  in  the  present  study  is  quite  different 
from  the  general  trends  described  in  the  preceding  paragraph.  We 
consider  ourselves  slaves  to  the  truth,  rather  than  masters  who  can 
choose  their  models  as  they  wish  and  for  their  own  convenience.  This 
orientation  leads  us  to  the  emphasis  upon  the  elimination  of  as  many 


V 


-5- 


1-3 


assumptions  as  possible,  and  upon  the  model  validation  whenever  we  use 
one.  The  author  hopes  that  the  present  study  will  stimulate  some  of  the 
researchers  following  general  trends  to  the  extent  that  they  wish  to  change 
their  ways,  following  harder  paths  to  reach  the  productivity  of  truly 
scientific  .'ense. 


References 


[1]  BirnLaum,  A.  Some  latent  trait  models  and  their  use  in  Inferring 

an  examinee’s  ability.  In  F.  M.  Lord  and  M.  R.  Novick, 
Statistical  theories  of  mental  test  scores .  Chapter  17-20. 
Reading,  Mass.:  Addison-Wesle.y ,  1968. 

[2]  Bock,  R.  D.  Estimating  item  parameters  and  latent  ability  when 

responses  are  scored  in  two  or  more  nominal  categories. 
Psychometrika ,  37,  1972,  29-51, 

[3]  Lawley,  D.  N.  On  problems  connected  with  item  selection  and  test 

construction.  Proceedings  of  the  Royal  Society  of  Edinburgh, 
1943,  61,  273-287. 

[4]  Lazarsfeld,  P.  F.  Latent  structure  analysis.  In  S.  Koch  (Ed.), 

Psychology:  A  study  of  a  science.  Vol.  3.  New  York:  McGraw- 
Hill,  1959,  476-542. 

f 5 ]  Lord,  F.  M.  A  theory  of  test  scores.  Psychometric  Monograph, 

No.  7,  1952. 

[6]  Rasch,  G.  Probabilistic  models  for  some  intelligence  and 

attainment  tests.  Copenhagen:  Nielson  and  Lydiche,  1960. 

[7]  Samejima,  F.  Estimation  of  ability  using  a  response  pattern  of 

graded  scores.  Psychometrika  Monograph.  No.  17,  1969. 

[8]  Samejima,  F.  A  general  model  for  f ree-responee  data. 

Psychometrika  Monograph,  No.  18,  1972. 

[9]  Samejima,  F.  Homogeneous  case  of  the  continuous  response  level. 

Psychometrika,  1973,  38,  203-219. 

[10]  Samejima,  F.  Normal  ogive  model  on  the  continuous  response  level 

in  the  multidimensional  latent  space.  Psychometrika,  1974, 

39,  111-121. 

[11]  Samejima,  F.  A  use  of  the  information  function  in  tailored 

testing.  Applied  Psychological  Measurement,  1977,  1,  233-247. 


l::  ii>j. 


\  'V,, . : 


•iiiiiriim<grfTi 


-6- 


II-l 


II  Research  Reports 

There  are  nineteen  technical  reports  published  during  the  contract 
period.  All  of  them,  except  for  three,  were  written  by  the  principal 
investigator.  The  three  technical  reports,  RR-79-2,  RR-80-1  and 
RR-81-3,  were  written  under  the  coauthorship  of  the  principal 
investigator,  and  Philip  Livingston,  Robert  Trestman  and  Paul 
Changae,  respectively.  There  is  one  Scientific  Monograph  published 
by  the  Tokyo  Office  of  the  Office  of  Naval  Research  in  1980.  There 
are  two  papers  in  the  proceedings  of  the  Computerized  Adaptive  Testing 
Conference,  in  1977  and  in  1979,  respectively.  The  titles  of  these 
twenty-two  research  reports  are  listed  on  the  following  pages. 

In  addition  to  them,  during  the  contract  period,  the  principal 
investigator  introduced  some  of  the  products  and  findings  of  the  present 
research  in  an  invited  paper  at  the  Fifth  International  Symposium  on 
Multivariate  Analysis,  which  was  held  at  the  University  of  Pittsburgh, 
in  1978.  The  title  of  the  paper  is  Latent  Trait  Theory  and  Its 
Applications,  and  was  published  in  Multivariate  Analysis  V  (Krishnaiah, 
Ed.;  North-Holland,  1980). 

The  twenty-two  research  reports  can  roughly  be  categorized  into 
seven  groups,  and,  in  the  list,  they  are  marked  with  different  symbols 
accordingly.  There  are  eleven  papers  which  are  marked  with  A  .  All  of 
them  concern  with  the  estimation  of  the  operating  characteristics  of 
discrete  item  responses,  and  the  estimation  of  the  ability  distribution. 
The  method  of  momenta  as  the  least  squares  solution  for  fitting  a 
polynomial  is  discussed  in  one  paper,  which  is  marked  with  3  .  There 
are  two  papers  marked  with  ¥  ,  and  they  are  concerning  the  new  family 
of  models  for  the  multiple-choice  test  item.  There  is  one  paper  with 
the  mark  «  ,  which  is  an  empirical  study  concerning  the  multiple-choice 
test  item,  and  is  related  with  the  previous  two.  There  are  three  papers 
on  the  Constant  Information  Model,  which  is  a  new  model  proposed  by  the 
principal  investigator,  and  these  papers  are  marked  with  <f  in  the  list. 
There  are  two  papers  on  the  computerized  adaptive  testing,  and  they  are 
marked  with  Q  .  Partly  related  with  these  two,  there  are  two  papers 


7- 


II-2 


A  (1) 

ft  (2) 

ft  (3) 

A  (A) 

A  (5) 

A  (6) 

A  (7) 

A  (8) 

A  (9) 

<1>  (10) 
3  (ID 
<D  (12) 


LIST  OF  ONR  TECHNICAL  REPORTS  AND  OTHERS 


Samejima,  F.  Estimation  of  the  operating  characteristics  of 

item  response  categories  I:  Introduction  to  the  Two-Parameter 
Beta  Method.  RR-77-1,  1977. 

Samejima,  F.  The  application  of  graded  response  models.  Proceedings 
of  the  1977  Computerized  Adaptive  Testing  Conference  (D.J.  Weiss, 
Ed.),  28-37,  1977. 

Samejima,  F.  Future  directions  for  computerized  adaptive  testing 
(panel  discussion) .  Proceedings  of  the  1977  Computerized 
Adaptive  Testing  Conference  (D.J,  Weiss,  Ed.),  430-440,  1977. 

Samejima,  F.  Estimation  of  the  opeicting  characteristics  of  item 

response  categories  XI:  Further  development  of  the  Two-Parameter 
Beta  Method.  RR-78-1,  1978. 

Samejima,  F.  Estimation  of  the  operating  characteristics  of  item 
response  categories  III:  The  Normal  Approach  Method  and  the 
Pearson  System  Method.  RR-78-2,  1978. 

Samejima,  F.  Estimation  of  the  operating  characteristics  of  item 
response  categories  IV:  Comparison  of  the  different  methods. 
RR-78-3,  1978. 

Samejima,  F.  Estimation  of  the  operating  characteristics  of  item 
response  categories  V:  Weighted  Sum  Procedure  in  the 
Conditional  P.D.F.  Approach.  RR-78-4,  1978. 

Samejima,  F.  Estimation  of  the  operating  characteristics  of  item 
response  categories  VI:  Proportioned  Sum  Procedure  in  the 
Conditional  P.D.F.  Approach.  RR-78-5,  1978. 

Samejima,  F.  Estimation  of  the  operating  characteristics  --  item 

response  categories  VII:  Bivariate  P.D.F.  Approach  with  Normal 
Approach  Method.  RR-78-6,  1978. 

Samejima,  F.  Constant  Information  Model:  A  new,  promising  item 
characteristic  function.  RR-79-1,  1979. 

Samejima,  F.  and  P.  S.  Livingston.  Method  of  moments  as  the  least 
squares  solution  for  fitting  a  polynomial.  RR-79-2,  1979. 

Samejima,  F.  Convergence  of  the  conditional  distribution  of  the 
maximum  likelihood  estimate,  given  latent  trait,  to  the 
asymptotic  normality:  Observations  made  through  the  Constant 
Information  Model.  RR-79-3,  1979. 


-8- 


II-3 


v  (13)  Samejima,  F.  A  new  family  of  models  for  the  multiple-choice 
item.  RR-79-4,  1979. 

$  (14)  Samejima,  F.  Constant  Information  Model  on  the  dichotomous  response 
level .  Proceedings  of  the  1973  Computerised  Adaptive 
Testing  Conference  (D.J.  Weiss,  Ed.),  145-163,  1979. 

H'  (15)  Samejima,  F.  Research  on  the  multiple-choice  test  item  in  Japan: 

Toward  the  validation  of  mathematical  models.  ONR-Tokyo , 
Scientific  Monograph  3,  April,  1980. 

00  (16)  Samejima,  F.  &  R.  L.  Trestman.  Analysis  of  Iowa  data  I:  Initial 
study  and  findings.  RR-80-1.,  1980. 

A  (17)  Samejima,  F.  Estimation  of  the  operating  characteristics  when 
the  test  information  of  the  Old  Test  is  not  constant  It 
Rationale.  KR-80-2,  1980. 

U  (18)  Samejima,  F.  Is  Bayesian  estimation  proper  for  estimating  the 
individual's  ability?  RR-80-3,  1980. 

A  (19)  Samejima,  F.  Estimation  of  the  operating  characteristics  when 
the  test  information  of  the  Old  Test  is  not  constant  II: 

Simple  Sum  Procedure  of  the  Conditional  P.D.F.  Approach/ 

Normal  Approach  Method  using  three  subtests  of  the  Old  Test. 
RR-80-4,  1980. 

H  (20)  Samejima,  F.  An  alternative  estimator  for  the  maximum  likelihood 

estimator  for  the  two  extreme  response  patterns.  RR-81-1,  1981. 

A  (21)  Samejima,  F.  Estimation  of  the  operating  characteristics  when  the 
test  information  of  the  Old  Test  is  not  constant  II:  Simple 
Sum  Procedure  of  the  Conditional  P.D.F.  Approach/Normal 
Approach  Method  using  three  subtests  of  the  Old  Test.  No.  2. 
RR-81-2,  1981. 

A  (22)  Samejima,  F.  &  P.  S.  Changes.  How  small  the  number  of  the  test 
items  can  be  for  the  basis  of  estimating  the  operating 
characteristics  of  the  discrete  responses  to  unknown  test 
items.  RR-81-3,  1981. 


concerning  Bayesian  vs.  the  maximum  likelihood  estimation  of  ability, 
which  is  marked  with  H  . 

The  contents  and  the  main  findings  of  these  papers  will  be 
integrated  and  summarised  in  the  following  chapters.  The  reader  will 
also  find  out  how  these  seemingly  separate  topics  are  related,  and 
how  we  can  use  them  together  to  accomplish  useful  research. 


-9- 


III-l 


III  Estimation  of  the  Operating  Characteristics  of  the  Discrete  Item 
Responses  anti  That  of  Ability  Distributions;  I 

As  we  have  seen  in  the  preceding  chapter,  there  are  eleven  papers 
written  on  these  two  subjects,  and  one  paper  on  the  method  of  moments 
which  takes  an  important  role  in  the  methods  and  approaches  for  these 
estimations.  In  the  present  chapter,  we  shall  start  integrating  the 
rationale,  data  and  methods  of  this  part  of  the  research,  and  organize 
them  into  several  sections. 

(III.l)  Relationship  between  the  Estimation  of  the  Operating 
Characteristics  and  that  of  Ability  Distributions 

By  discrete  item  responses  we  mean  any  discrete  answer  to  the 
item,  including  both  free  responses  and  multiple-choice  responses.  When 
free  responses  are  treated  as  they  are,  or  more  or  less  categorized 
depending  upon  their  mutual  similarities,  they  provide  us  with  nominal 
responses.  If  we  use  a  dichotomous  scoring  stategy  by  categorizing  them 
into  two  categories,  i.e.,  "correct"  and  "incorrect",  then  they  will  be 
treated  as  dichotomous  responses.  If  we  adopt  a  more  graded  scoring 
strategy  by  categorizing  them  into  more  than  two  categories,  i.e.,  0 

through  m  for  item  g  ,  depending  upon  their  closeness  to  the  correct 

o 

answer,  then  they  will  be  treated  as  graded  responses.  In  each  case,  we 
have  discrete  item  responses. 

Let  0  be  ability,  or  latent  trait,  which  assumes  any  real  number. 
Let  f(0)  be  the  density  function  of  ability  0  for  a  given  group  of 
examinees.  We  denote  the  set  of  all  the  discrete  responses  to  item  g  by 

K  ,  and  its  element  by  k  or  h  Then  the  density  function,  f(0)  , 
8  8  8 
can  be  written  as 


(3.1) 


where 


f(0) 


l  f  (0)  p(k  )  , 


kEK 
g  g 


g 


f,  (0)  is  the  density  function  of  ability  6  for  the  subgroup 

g 

of  examinees  whose  responses  to  item  g  are  uniformly  k  ,  and  p(k  ) 

8  g 

is  the  probability  assigned  to  the  subgroup  within  the  total  group  of 


examinees.  We  can  write  for  the  operating  characteristic. 


pk  w 

g 


of 


-10- 


111-2 


the  discrete  item  response  k^  such  that 

(3.2)  P.  (9)  -  f.  (6)  p(k)  [  l  f.  (9)  p(h  )]“1  . 

*  8  b« 

Equation  (3.2)  Indicates  that  the  estimated  operating  characteristic 

of  a  discrete  Item  response  can  be  obtained  by  the  ratio  of  Its 

estimated  absolute  frequency  of  ability  to  the  absolute  frequency  for  the 

whole  set,  K  .  Throughout  the  present  study,  this  ratio  is  the 
S 

estimated  operating  characteristic  we  adopt.  Any  method  for  estimating 
the  operating  characteristics  of  discrete  item  responses  includes, 
therefore,  the  estimation  of  two  or  more  ability  distributions.  In  other 
words,  those  methods  and  approaches  developed  in  the  present  study  are 
not  only  for  the  estimation  of  the  operating  characteristics  but  also  for 
the  estimation  of  ability  distributions. 

There  is  a  certain  invariance  property  in  the  estimated  operating 
characteristic  over  the  transformation  of  the  latent  trait,  which  is  not 
shared  by  the  estimated  probability  density  of  ability.  Let  T  be  a 
strictly  increasing  and  differentiable  function  of  0  .  We  have  for  the 
densities,  f*(t)  and  f£  (t)  ,  for  the  transformed  latent  trait  T  , 
such  that  ® 

(3.3)  f*(x)  -  f (9)  ||  , 
and 

(3.4)  t*  (T)  -  fk  (6)  f  , 

S  g 

for  any  discrete  response  k  £K  .  From  (3.2)  and  (3. A)  it  is  obvious 

that  for  the  operating  characteristic,  P*  (t)  ,  we  have 

g 

(3.5)  P*  (T)  -  Pfc  (0)  , 

g  8  • 

which  indicates  the  invariance  of  the  estimated  operating  characteristic 
over  the  transformation  of  the  latent  trait. 


11- 


III- 


(111.2)  No  Mathematical  Forms  Arc  Assumed  for  the  Operating 
Characteristics  of  the  Unknown  Test  Items 

Most  researchers  preassume  some  mathematical  model  for  the 
operating  characteristics  of  the  item  responses  of  their  unknown 
test  items.  In  such  a  case,  the  estimation  of  the  operating 
characteristics  is  converted  to  the  estimation  of  a  small  number  of 
item  parameters.  This  simplification  will  make  it  easy  for  us  to 
conduct  our  research.  On  the  other  hand,  in  so  doing,  we  may 
distort  the  psychological  reality,  which  is  the  very  object  of  our 
research,  by  molding  it  into  some  irrelevant  model.  Thus  both  the 
deductive  and  inductive  validations  of  the  model  are  by  far  the  most 
important  when  we  adopt  any  mathematical  model.  In  other  words,  the 
model  must  follow  a  rationale  which  also  explains  the  psychological 
reality  behind  our  data,  and,  once  they  were  analyzed,  we  must 
validate  the  model  by  finding  out  if  the  internal  consistency  exists. 

The  importance  of  the  model  validation  seems  to  be  forgotten 
by  many  researchers,  however.  To  give  an  example,  the  popularity  of 
Rasch  model  mainly  depends  upon  its  mathematical  simplicity,  which 
comes  from  the  fact  that  it  has  only  one  parameter,  i.e,,  the 
difficulty.  Very  few  researchers  stop  to  think,  however,  whether 
this  particular  model  and  its  simplicity  are  appropriate  for  their 
data,  nor  do  they  try  to  find  out  the  validity  of  the  model  by 
checking  the  internal  consistency  in  their  results.  Another  example 
is  the  way  many  researchers  use  the  three-parameter  logistic  model 
for  their  data  of  multiple-choice  teBt  items.  The  rationale  behind 
the  model  is  the  knowledge  9t  random  guessing  principle,  which  is 
rather  unlikely  to  be  the  case  in  most  multiple-choice  testing 
situations.  Among  others,  the  fact  that  they  are  ready  to  accept  a 
value  which  is  less  than  the  reciprocal  of  the  number  of  the 
alternatives  of  a  specified  multiple-choice  test  item  as  the  third 
parameter,  i.e.,  the  guessing  parameter,  is  nothing  but  defeating 
itself . 

To  avoid  the  possibility  of  adopting  an  irrelevant  mathematical 
model,  the  best  solution  will  be  to  develop  methods  of  estimating  the 


-12- 


111-4 


operating  characteristics  of  the  discrete  Item  responses  without 
assuming  any  mathematical  forms.  In  the  present  study,  this  direct 
approach  to  the  operating  characteristics  is  consistently  used. 
Although  It  creates  more  difficulty  and  requires  more  labors  In 
developing  our  methods  and  approaches,  it  is  worth  our  effort 
considering  the  due  cause  we  have.  The  reader  will  find  similar 
attempts  in  the  works  by  Lord  (Lord,  1970)  and  Levine  (Levine, 

1980),  i.e.,  estimation  of  the  operating  characteristics  without 
assuming  any  mathematical  forms. 

(111.3)  Small  Number  of  Examinees  in  the  Calibration  Data 

For  a  relatively  few  researchers  whose  calibration  data  are 
obtained  from  institutes  like  Educational  Testing  Service,  It  is  easy 
to  use  those  which  were  collected  upon  several  hundred  thousand 
examinees.  For  most  researchers  who  do  their  research  in  university 
environments,  however,  the  situation  is  quite  different.  It  may  be 
extremely  difficult  for  them  to  find  even  one  thousand  volunteer 
students  for  their  subjects.  For  this  reason,  it  is  necessary  that 
we  should  investigate  and  develop  methods  of  estimating  the  operating 
characteristics  which  do  not  require  more  than  several  hundred 
examinees  for  our  calibration  data. 

This  is  one  of  the  important  considerations  in  the  present 
study.  Our  calibration  data  are  based  upon  five  hundred  hypothetical 
examinees,  whose  ability  levels  are  at  one  hundred  equally  spaced 
positions  on  the  ability  dimension,  with  five  examinees  being  placed 
at  each  position.  This  configuration  can  be  considered  as  an 
approximation  to  a  uniform  distribution  of  ability.  To  be  specific, 
the  five  hundred  ability  levels  range  from  -2.475  to  2.475  ,  with 
the  equal  steps  of  0.05  .  The  uniform  distribution  has,  therefore, 
the  density  of  0.2  ,  for  the  interval  of  ability  0  ,  (-2.5,  2.5)  , 
as  is  shown  in  Figure  3-3-1. 


-4.0  -&0  -2.0  -10  ao  IjO  2.0  3.0  4.0 

LATENT  TRAfT  0 


FIGURE  3-3-1 

Ability  Distribution  of  Our  lypotbotleol  Ksaalaau. 
Actually,  tha  Viva  Hundred  txamlnaaa  Ara  Placed  at 
the  One  Hundred  Equally  Spaced  Position*  from 
-2.475  to  2.475  ,  with  Plv*  Examinees 
Sharing  Each  Poaltion. 


(III. 4)  Old  Test 

It  is  assumed  that  there  exists  a  set  of  test  Items  whose 
operating  characteristics  are  known,  and  our  examinees  have  taken 
the  test,  as  well  as  a  set  of  test  items  whose  operating 
characteristics  are  to  be  estimated.  We  call  the  first  set  of  test 
items  Old  Test,  and  the  estimation  of  the  operating  characteristics 
of  the  test  items  of  the  second  set  is  based  upon  the  examinees' 
performances  on  the  Old  Test. 

The  methods  and  approaches  developed  on  this  assumption  are 
directly  useful  in  such  a  situation  that,  in  adaptive  testing,  we 
have  a  well-constructed  item  pool,  but  we  want  to  add  more  test  items 
to  our  item  pool.  Another  suitable  situation  will  be  that  we  have  a 
relatively  small  number  of  well  developed  test  items  which  have  a 
high  content  validity  for  our  purpose  of  measurement,  and  on  the 
trlal-and-error  basis  we  have  obtained  confirmed  mathematical  model 
or  models  for  separate  test  items  with  respect  to  their  deductive 
and  inductive  validities,  so  that  we  shall  be  able  to  use  them  as  our 
Old  Test. 


-14- 


111-6 


This  assumption  of  the  existence  of  the  Old  Test  is  a 
restriction,  which  we  may  wish  to  eliminate  so  that  we  shall  be  able 
to  expand  the  applicability  of  our  methods  and  approaches  to  the 
situation  where  we  must  start  the  calibration  of  the  operating 
characteristics  from  scratch.  There  are  two  different  attempts  for 
this  purpose,  which  will  be  discussed  in  a  later  chapter. 

In  the  present  study,  a  set  of  thirty-five  test  items  has  been 
chosen  as  our  original  Old  Test.  Each  of  these  thirty-five  items  has 
three  graded  item  score  categories,  and  follows  the  normal  ogive 
model  such  that 


(3.6)  P^  (6)  -  [27t 


,-1/2 


i 


V6‘bx  )  2, 

8  «  /2  du 

ag(6-b*  +1> 

g 


where  x  (-0,1,..., m  )  is  the  graded  item  score  of  item  g  , 

8  g 

P  (0)  is  its  operating  characteristic,  a  (>  0)  is  the  item 

X8  8 
discrimination  parameter,  and  bx  is  the  item  response  difficulty 

parameter  which  satisfies  8 


(3.7) 


<  bx  < 


<  b 


g 


<  b  , .  -  00 
m  +1 
g 


The  item  parameters  and  item  response  parameters  of  these  thirty- 
five  test  items  are  shown  in  Table  3-4-1  .  We  have  also  used  nine 
different  subtests  of  the  original  Old  Test  as  our  Old  Test  on 
different  occasions,  and  these  subtests  are  shown  in  the  same  table 
by  indicating  the  test  items  by  crosses.  The  numbers  of  test  items 
in  these  subtests  range  from  five  to  twenty-five. 

We  can  write  for  the  item  response  information  function, 

I  (0)  ,  such  that 
X8 

(3.8)  I  (0)  -  -^log  Px  (0)  , 

g  g 


and  the  item  information  function, 


I  (0)  ,  is  given  as  the  conditional 
g 


1 


-15-  III-7 


TABLE  3-4-1 

Itw  FiTuttin  of  tb«  Tut  itMU  of  Our  Old  Tuts . 


l 

learn  g 

Subto.t* 

1  -  ■ 

S 

bl 

b2 

1 

2 

3 

-J_ 

4 

5 

6l 

7 

8 

9 

S' 

i 

1.8 

-4.75 

-3.75 

X 

1 

2 

1.9 

-4.50 

-3.50 

* 

X 

I- 

3 

2.0 

-4.25 

-3.25 

X 

X 

X 

4 

1.5 

-4,00 

-3.00 

X 

X 

X 

5 

1.6 

-3.75 

-2.75 

X 

r  » ■ 

6 

1.4 

-3.50 

-2.50 

X 

X 

X 

X 

X 

X 

7 

1.9 

-3,00 

-2.00 

X 

X 

l  i 

8 

1,8 

-3.00 

-2.00 

X 

X 

X 

X 

9 

1.6 

-2.75 

-1.75 

X 

X 

X 

X 

10 

2.0 

-2.50 

-1.50 

X 

X 

X 

X 

{'  *  '  i 

11 

1.5 

-2.25 

-1.25 

X 

X 

X 

X 

X 

l 

12 

1.7 

-2.00 

-1.00 

X 

X 

X 

X 

X 

X 

fi  : 

13 

1.5 

-1.75 

-0.75 

X 

X 

X 

14 

1.4 

-1.50 

-0.50 

X 

X 

X 

X 

15 

2.0 

-1.25 

-0.25 

X 

X 

X 

t-h 

16 

1.6 

-1.00 

0.00 

X 

X 

X 

i»  1 

17 

1.8 

-0.75 

0.25 

X 

X 

18 

1.7 

-0.50 

0.50 

X 

X 

X 

X 

X 

X 

( 

7 

19 

1.9 

-0.25 

0.75 

X 

j 

20 

1.7 

0.00 

1.00 

X 

X 

J 

21 

1.5 

0.25 

1.25 

x 

X 

X 

22 

1.8 

0.50 

1.50 

X 

X 

X 

• 

23 

1.4 

0.75 

1.75 

X 

X 

X 

X 

? 

24 

1.9 

1.00 

2.00 

X 

X 

X 

X 

X 

25 

2.0 

1.25 

2.25 

X 

X 

X 

X 

X 

1 

J 

26 

1.6 

1.50 

2.50 

x 

X 

X 

X 

X 

1  J 

27 

1.7 

1.75 

2.75 

X 

X 

X 

ft 

28 

1.4 

2.00 

3.00 

* 

X 

X 

X 

X 

■ 

29 

1.9 

2.25 

3.25 

X 

X 

30 

1.6 

2.50 

3.50 

X 

X 

X 

X 

X 

X 

31 

1.5 

2.75 

3.75 

X 

l  : 

32 

1.7 

3.00 

4.00 

X 

X 

X 

!■ 

33 

1.8 

3.25 

4.25 

X 

X 

X 

h, 

34 

2.0 

3.50 

4.50 

X 

X 

l  ;  ; 

35 

1.4 

3.75 

4.75 

X 

J 

_ 

_ 

expectation  of  the  item  response  information  function,  i.e.. 


m 

(3.9)  1  (6)  -  E8  I  (6)  P  (6)  . 

8  v° 

The  response  pattern  of  the  set  of  n  test  items  is  the  set  of  the 
n  item  scores  such  that 


J 


i 


■L.iiu&sJ.., 


** 

/ 


—  • 


4. 


r 

I 


(3.10)  V  -  (x. ,  x„ ,  ...  ,  x  ,  ...  ,  x  ) *  • 

12  g  n 

By  virtue  of  the  local  independence  (Lord  and  Novick,  1968,  Chapter 
16),  the  operating  cnaracteristic  of  the  response  pattern  V  is 
given  as  the  product  of  the  n  operating  characteristics  of  the 
item  scores,  so  that  we  have 


-17- 


III- 


(III. 5)  Set  of  Five  Hundred  Maximum  Likelihood  Estimates 

The  maximum  likelihood  estimate  of  the  examinee's  ability 
when  each  of  the  n  items  follows  the  normal  ogive  model  can  be 
obtained  numerically  (Samejima,  1969,  1972),  by  using  the  operating 

characteristic  Pv(6)  as  the  likelihood  function.  Let  (6)  be 

X8 

the  basic  function  of  item  score  x  ,  which  is  defined  by 

g 

(3.14)  Ax  (9)  -  log  Px  (9)  . 

g  g 

We  can  write  for  the  maximum  likelihood  estimate  0^  for  the  response 
pattern  V  such  that 

(3.15)  I  Ax  (0V)  -  0  . 

*8eV  8 

In  the  normal  ogive  model,  this  basic  function  is  a  strictly 

decreasing  function  of  0  ,  and  the  two  asymptotes  of  the  basic 

function  are  0  and  -00  for  the  lowest  extreme  response  pattern 

(0,0,..., 0)  ,  <»  and  0  for  the  highest  extreme  response  pat-e.  \, 

(m, ,m. , , . . ,m  )  ,  and  -°°  and  00  for  all  the  other  intermediate 
l  z  n 

response  patterns. 

In  our  study,  by  the  Monte  Carlo  method,  we  calibrated,  for 
each  hypothetical  examinee,  the  response  pattern  of  the  n  test 
items  of  the  Old  Test,  and  based  upon  this  response  pattern  the 
maximum  likelihood  estimate  of  his  ability  was  obtained.  This  set 
of  five  hundred  maximum  likelihood  estimates  takes  an  essential 
role  in  the  calibration  of  the  operating  characteristics  of  each  of 
our  unknown  test  items. 

The  maximum  likelihood  estimate  has  such  an  asyptotic 

property  that  the  estimate  is  conditionally  unbiased  and  normally 

-1/2 

distributed  with  9  and  [1(9)]  as  its  two  parameters,  given 

9  .  It  has  been  observed  (Samejima,  1975,1977a,  1977b)  that  this 
asymptotic  normal  distribution  can  be  used  as  a  good  approximation 


-18- 


111-10 


to  the  conditional  distribution  of  §v  ,  given  6  ,  even  when  the 
number  of  test  items  is  not  so  large  and  the  amount  of  test 
information  is  relatively  small.  Throughout  the  present  study, 
this  approximation  is  effectively  used. 

(III. 6)  Unknown  Test  Items  Whose  Operating  Characteristics  Are  to 
Be  Estimated 

There  are  ten  hypothetical,  binary  test  items,  and  throughout 
the  present  study,  our  target  is  the  estimation  of  the  operating 
characteristic  of  x  -  1  ,  or  the  item  characteristic  function, 

s 

for  each  of  these  ten  binary  items.  Let  P^(9)  he  the  item 
characteristic  function  of  the  unknown  test  item  h  .  For  each 
item,  this  item  characteristic  function  follows  the  normal  ogive 
model,  such  that 

(3.16)  Ph(0)  -  [2tr]~1/2  j  h<  h>  e"u  /2  du  . 

The  discrimination  parameter  ah  and  the  difficulty  parameter  bh 
are  shown  in  Table  3-6-1  for  each  of  these  ten  binary  test  items. 


TABLE  3-6-1 

Ita*  Uiacrlalnatlcm  Faraaatar  a^ 
and  I tea  Difficulty  Paraaatar  K._ 
of  Each  of  Tan  Binary  Itaaa 


. . . 

Item  h 

ah 

n 

1 

1.5 

-2.5 

2 

1.0 

-2.0 

3 

2.5 

-1.5 

4 

1.0 

-1.0 

5 

1.5 

-0.5 

6 

1.0 

0.0 

7 

2.0 

0.5 

8 

1.0 

1.0 

9 

2.0 

1.5 

10 

1.0 

2,0 

-19- 


III-ll 


There  is  no  doubt  for  the  necessity  of  using  more  varieties 
of  operating  characteristics  for  the  unknown  test  items,  including 
unimodal  functions,  functions  with  non-zero  asymptotes,  and  so  on. 
Because  of  the  amount  of  work  done  in  the  present  study,  however, 
this  has  to  wait  for  future  research.  The  author  hopes  that  some 
other  researchers  will  get  interested  in  conducting  such  research, 
using  the  methods  and  approaches  developed  in  the  present  study. 

(III. 7)  Use  of  Robust,  Indirect  Information 

Lord  adopted  his  own  method  (Lord,  1969)  of  estimating  true 
score  distributions  from  the  observed  score  distributions  in  his 
attempt  (Lord,  1970)  of  estimating  the  item  characteristic  functions 
of  the  SAT  Verbal  Test  items  without  preassuming  any  mathematical 
forms.  He  excluded  the  item  under  study  from  the  total  test  In 
defining  the  test  score.  This  direct  approach  to  the  operating 
characteristics  dees  not  require  Old  Test,  and  we  can  start  from 
the  direct  observation  of  the  sample  test  score  distributions.  The 
number  of  examinees  Lord  used  in  his  calibration  of  the  item 
characteristic  functions  is  103,275  .  This  valuable  study  by  Lord 
provides  us  with  a  methodology  which  we  can  use  for  empirical  data 
which  are  found  in  large  institutes  like  Educational  Testing  Service. 

There  is  no  question  that  a  large  sample  size  is  desirable 
in  the  estimation  of  the  operating  characteristics.  There  is  a 
necessity,  however,  that  we  should  develop  methodologies  which  are 
applicable  for  much  smaller  groups  of  examinees.  Levine  (Levine, 
I960)  developed  a  method  with  this  consideration  in  mind.  Following 
the  present  study  by  the  author,  he  used  Old  Test  as  the  basis  of 
calibrating  the  operating  characteristics  of  unknown  test  items. 

In  his  method,  Levine  introduced  a.  set  Of  orthonormal  eigenfunctions, 
the  number  of  which  does  not  exceed  the  number  of  all  possible 
response  patterns  of  the  Old  Test.  In  practice,  this  number  is  much 
less  than  this  maximal  value,  and  it  is  interesting  to  note  that  it 
depends  not  only  upon  the  number  of  test  items  in  the  Old  Test  but 
also  upon  the  number  of  examinees.  In  other  words,  Levine's  method 


involves  a  certain  trade-off  relationship  between  the  number  of 
examinees  and  that  of  test  items  in  the  Old  Test.  He  has  tried  his 
own  method  using  the  author's  simulated  data  based  upon  the  original 
Old  Test  (cf.  Section  111.4)  and  the  five  hundred  hypothetical 
examinees  (cf.  Section  111.3),  and  his  results  turned  out  to  be 
successful.  He  also  tried  his  method  using  SAT  test  items  (Levine, 
1981),  using  somewhat  larger  numbers  of  examinees,  like  one 
thousand . 

In  using  a  small  number  of  examinees  as  the  basis  of  the 
calibration  of  operating  characteristics,  we  need  some  additional 
information  other  than  the  one  which  is  directly  observable,  such 
as  the  observed  test  score  distribution,  the  response  pattern,  and 
so  on.  Such  indirect  information  must  be  robust  to  the  fluctuation 
caused  by  a  small  sample  size.  In  the  present  study,  the  conditonal 

A 

moments  of  6  ,  given  its  maximum  likelihood  estimate  6  ,  serves 
for  the  purpose.  In  other  words,  instead  of  approaching  the 
ability  distribution  directly  as  is  the  case  with  Lord's  method  and 
Levine's  method,  we  focus  our  attention  to  the  conditional 
distribution  of  ability  0  ,  given  its  maximum  likelihood  estimate 

A  A 

6  ,  or  the  bivariate  distribution  of  0  and  0  .  Thus  the  estimated 
unconditional  ability  distribution  is  obtained  as  an  aggregate  of 
the  estimated  conditional  density  function  of  0  ,  given  0  ,  or  in 
the  form  of  integration  of  the  estimated  bivariate  density  function 
of  0  and  0  , 

Let  us  assume  that  the  square  root  of  the  test  information 
function  of  our  Old  Test  is  constant  for  the  interval  of  0  of  our 
interest,  as  is  the  case  with  our  original  Old  Test.  We  shall 
denote  the  conditional  density  of  0  ,  given  ability  0  ,  by 

4^(0 | 0)  .  By  virtue  of  the  asympototic  normality  of  the  conditional 

distribution  of  0  ,  given  0  ,  \J;(0|0)  is  approximated  by  the 

-1/2 

normal  density  function,  with  0  and  [1(0)1  as  its  parameters. 

-1/2 

Let  o  denote  the  constant  value  of  [1(0)1  '  .  The  firet 

through  fourth  derivatives  of  i^(0 | 0)  with  respect  to  0  can  be 
written  as  follows. 


-21- 


III-13 


(3.17) 

(3.18) 

(3.19) 

(3.20) 


-i  ^(© ! 0)  -  -<K§je)cr2(§-e)  . 

da 

X2  ij;(e  1 8)  -  4j(6  |6>a“2[a“2(e-8)2  -  1]  , 

do 

X3  we|e)  -  3iKe|e)cf4<e-e)  -  ^(0 !e)cr“6C0--e) 3  . 

do 

4/(0 10)  -  3^(0  |6)cfl  64/(0 |e)a'6(0-e)2  +  4/(0l0)a"8(0-9)A  . 
30 


Let  g(@)  be  the  density  function  of  the  maximum  likelihood  estimate 

/*s 

0  .  We  can  write 


(3.21) 


g(0) 


i: 


4/(0|0)f(0)  d0 


A 

Let  us  assume  that  this  density  function,  g(0)  ,  is  four  times 
differentiable.  We  obtain  for  the  conditional  expectation  of  0  , 
given  6  ,  and  the  second,  third  and  fourth  conditional  moments  of 
0  about  the  mean,  given  0  , 


(3.22)  E(0  j @)  -  §  +  02  ~  log  g(§)  -  6  +  02[A.  g(@)][g(0)r:L  . 

(3.23)  Var .  (0  1 0)  -  02[1  +  O2  ~2  log  g(0)] 

-  a2[l  4-  a2{  g(§)*g(@)  -  [  Jc  g(0)]2Hg(§))  2]  • 

(3.24)  E[{e-E(0|§)}3!§]  -  06[  log  g(§)3  . 

and 

Et-C0-E(e  1§)  >**  jei  -  a4  [3  +  6a2{.^2  log  g(§)  > 

+  3a4{  ,^2  log  g<§)}2  +  a4{^4  log  g(§)}]  . 


(3.25) 


-22- 


III— 14 


We  can  see  from  the  above  four  formulas  that  these  conditional 
moments  are  specified  exclusively  by  8  ,  g(§)  and  0  .  Note, 

A 

moreover,  that  if  the  density  function,  g(0)  ,  is  estimated,  then 
these  conditional  moments  are  obtainable  for  any  value  of  0  within 
its  meaningful  interval.  The  first  through  fourth  derivatives  of 
log  g(0)  can  be  written  as  follows. 

(3.26)  '  ^  log  g<§)  -  ^  g(§)  [gCS)]"1  . 

(3.27)  ^2 log  g<$)  -  [g<8)^g(§)  -  g(@)}2Hg(@)r2  • 

(3.28)  ^log  g(8)  -  [{g(8)l2.  &,<*)  -  3g<§).  g(8).  i£g(8) 

+  2{  g(@)}3]  tg(8)  ]~3  . 

(3.29)  -^i,log  g(0)  «  [(g(§)}3*  ^4  g(@) 

-  4{g(@)}2*  ^  g(§)‘  ^pg(§) 

-  3{g(§)  }2{  ^g(6))2 

+  I2g(§){  g(8)}2*  -^2g(8) 

-  6(  33  g(§)>4][g(§)r4 . 


We  notice  that,  since  0  is  obtained  as  the  reciprocal  of 
the  square  root  of  the  test  information  function  of  the  Old  Test, 

A 

all  we  need  Is  to  estimate  the  density  function  g(0)  from  the  set 

/S 

of  N  maximum  likelihood  estimates,  6  (s»l,2, . . .  ,N)  ,  with 

s 

the  consideration  of  making  the  resultant  density  function  four  times 
differentiable.  This  can  be  done  by  using  the  method  of  moments 
(Elderton  and  Johnson,  1969),  and  approximating  a  polynomial  to 
the  density  function  g(§)  .  The  rationale  behind  this  method  will 
be  given  in  Chapter  4. 


-23- 


III-15 


(III. 8)  Transformation  of  Ability  8  to  X 

We  notice  that  the  relatively  simple  formulas,  (3.22)  through 

(3. 25),  for  the  conditional  moments  of  ability  6  ,  given  its  maximum 

likelihood  estimate  §  ,  are  true  only  when  the  square  root  of  the 

test  information  function  is  constant  for  the  interval  of  ability  of 

our  interest,  as  is  the  case  with  our  original  Old  Test.  As  we  have 

Been  earlier  (cf.  Section  III. 4),  for  all  the  other  nine  Old  Tests, 

i.e. ,  subtests  of  the  original  Old  Test,  the  square  root  of  the  test 

information  function  is  not  constant.  When  we  use  one  of  these  nine 

subtests  as  our  Old  Test,  therefore,  (3.22)  through  (3.25)  are  no  longer 

true  as  they  are.  This  problem  can  be  solved  by  transforming  6  ,  in 

such  a  way  that  the  resultant  transformed  latent  trait  T  has  a  constant 

1/2 

value  for  the  square  root  of  the  test  information  function,  (I*(r)]  , 

for  the  meaningful  interval  of  t  • 

Let  t  be  a  function  of  8  ,  such  that 


(3.30) 


t(6) 


which  is  strictly  increasing  in  8  .  The  operating  characteristic, 

P*  (T)  ,  of  the  item  response  xft  defined  for  the  transfbrmed  latent 


8 


g 


trait  t  equals  the  original  operating  characteristic,  P  (8)  ,  which 

Xg 

is  obvious  from  its  definition  as  the  conditional  probability.  Thus 
we  can  write 


(3.31)  P*  <t)  -  P*  [t(8)3  -  Pv  (8)  . 

A  A  X 

8  g  g 

From  (3.31)  and  (3.8),  we  can  write  for  the  item  response  information 

function,  I*  (t)  ,  such  that 

Xg 


I*  (t)  -  -  -V-  log  P*  (t) 

Xg  **  Xg 

“  **  <•>  "  k  *x  <•>•  $ 

g  g 


(3.32) 


-24- 


m-i6 


From  this  result,  we  have  for  the  item  information  function  I*(t)  , 

mg 

(3.33)  1*(t)  -El*  (t)  P*  Ct)  -  1(8)  l4J)2, 

8  x  -0  Xg  g  8 

8 

since 

®8 

(3.34)  l  J~  Px  <6)  -  0  . 

Y°  8 

It  can  be  seen  that,  with  the  response  pattern  V  ,  we  obtain 
similar  results,  such  that 

(3.35)  P*(t)  -  P*[t<6)}  -  Pv(8) 

for  the  operating  characteristic,  Pj£(T)  ,  and 

<3.36)  I*(T)  -  IyC6)  [|fl!  -  ±  log  Pv<6)  0 

for  the  information  function,  I*(t)  .  We  can  write  for  the  test 
information  function  I*(t)  either  from  (3,36)  or  from  (3,33)  such 
that 

(3.37)  I*(t)  -  1(6)  [||]2  , 

and,  since  t  is  a  strictly  increasing  function  of  8  ,  we  have 

(3.38)  (l*(t)]1/2  -  (1(0)]1/2  ~  . 

Let  C  be  an  arbitrary  constant  for  the  square  root  of  the 

1/2 

test  information  function,  [I*(t)]  .  From  (3.38)  we  can  write 

(3.39)  -  Cf1  [I(6)]1/2  . 

Thus  we  obtain  for  the  transformation  of  6  to  T  such  that 


DENSITY 


-25- 


111-17 


(3. AO)  T  -  Cf1  j  [1(6) ]i/2  d0  +  d  , 

where  d  Is  an  arbitrary  constant  for  adjusting  the  origin  of  T  . 

In  practice,  this  transformation  will  be  much  more  simplified 

1/2 

if  we  approximate  the  function,  [1(6)]  by  a  polynomial  of  an 
appropriate  degree,  using  the  method  of  moments.  The  detail  of  this 
process  will  be  given  in  Chapter  5. 

We  can  write  for  the  density  function,  f*(t)  ,  of  the 
transformed  ability 

(3.41)  f*(T)  -  f (6)  ||  . 

This  equation  indicates  that  the  new  density  function  thu'  obtained 
is  no  longer  uniform,  as  is  the  case  with  our  density  function  of 
6  .  Figure  3-7-1  illustrates  two  examples  of  f*(f)  as  the 
results  of  the  transformation  of  0  to  T  ,  which  are  based  upon 
Subtests  1  and  2  ,  respectively. 


T  ,  •  T,  • 

FIGURE  3-7-1 

tMMlty  Function,  f*(t)  ,  of  t  TronsforMd  from  •  by  tbo  Polynomial 
of  Docroo  8  (Solid  Curvo) ,  la  Contract  to  tha  Original  Donolty  Function 
f(8)  (Dottod  Curvo),  whoa  wo  uaod  Subtoot  1  (Loft)  and  Subtost  2 
(Fight)  os  our  Old  Toot,  loopoctivoly. 


The  maximum  likelihood  estimate,  §  ,  of  ability  6  ,  which 
is  based  upon  the  response  pattern  V  ,  can  be  obtained  by  using 
the  operating  characteristics  Pv(6)  as  the  likelihood  function. 


ii  ■UJlPMH 


-26- 


III-18 


In  a  similar  manner,  the  corresponding  maximum  likelihood  estimate, 
T  ,  can  be  obtained  by  using  P*v(t)  as  the  likelihood  function. 
By  virtue  of  the  transformation-free  character  of  the  maximum 
likelihood  estimator,  however,  this  second  maximum  likelihood 
estimate  can  also  be  obtained  by  the  direct  transformation  of  @  , 
such  that 

(3.42)  $  -  t(6) 

(cf.  Samejima,  1969). 


REFERENCES 


[1]  Elderton,  W.  P.  and  N.  L.  Johnson.  Systems  of  frequency  curves. 

Cambridge  University  Press,  1969. 

[2]  Levine,  M.  Appropriateness  measurement  and  the  formula-score 

method:  overview,  intercorrelations  and  interpretations. 

■  Paper  presented  at  the  ONR  Conference  on  Model-Based 
Psychological  Measurement.  1980,  Iowa  City,  Iowa. 

[3]  Levine,  M.  Ability  distribution  measurement  for  short  and 

complex  tests .  Paper  presented  at  the  ONR  Conference  on 
Model-Based  Psychological  Measurement,  1981,  Millington, 
Tennessee. 

[4]  Lord,  F.  M.  Estimating  true-score  distributions  in 

psychological  testing  (an  empirical .Bayes  estimation 
problem).  Psychometrika,  1969,  34,  259r299. 

[5]  Lord,  F.  M.  Item  characteristic  curves  estimated  without 

knowledge  of  their  mathematical  form — a  confrontation 
of  BIrnbaum’s  logistic  model.  Psychometrika,  1970,  35, 
43-50. 

[6]  Lord,  F.  M.  and  M.  R.  Novick.  Statistical  theories  of  mental- 

test  scores.  Reading,  Mass.:  Addison-VJesley,  1968. 

[7]  Samejima,  F.  Estimation  of  latent  ability  using  a  response 

pattern  of  graded  scores.  Psychometrika  Monograph,  No.  17 
1969. 

[8]  Samejima,  F.  A  general  model  for  free-response  data. 

Psychometrika  Monograph,  No.  18,  1972. 


The  method  of  moments  (Elderton  and  Johnson ,  1969)  was  frequently 
used  In  the  present  study,  and  on  many  occasions  it  took  an  important 
role.  In  some  situations,  we  fitted  Pearson  type  density  functions,  and 
in  many  other  situations  ve  used  polynomials.  It  should  be  noted  that, 
when  we  adopt  a  polynomial  to  approximate  a  density  function,  there  is  a 
possibility  that,  for  some  range  of  the  variable,  the  estimated  density 
turns  out  to  be  negative.  In  practice,  however,  it  seldom  happened,  and, 
even  when  it  did,  it  did  not  seriously  affect  the  process  or  the  result 
of  our  estimation.  Since  the  polynomial  is  less  restrictive  in  its  shape 
than  many  other  functions  which  have  the  same  number  of  parameters,  and 
in  addition,  its  derivatives  are  given  as  even  simpler  polynomials,  the 
method  of  moments  for  fitting  a  polynomial  looks  promising. 

In  this  chapter,  the  rationale  and  reason  behind  the  success  of 
using  polynomials  as  functions  for  us  to  fit  by  the  method  of  moments 
are  described,  and  some  observations  are  made.  This  part  of  the  present 
final  report  is  mainly  cited  from  the  research  report  RR-79-2,  which 
includes  the  fine  effort  by  one  of  the  author's  assistants,  Philip 
Livingston. 

(IV. 1)  Approximation  to  the  Density  Function  from  a  Set  of  Observations 

The  method  of  moments  was  originally  developed  to  graduate  the 
observed  frequency  distribution  by  assuming  some  specific  mathematical 
function  and  fitting  the  observed  moments  of  up  to  a  specified  degree. 

This  can  readily  be  expanded  to  the  case  in  which  we  wish  to  estimate  a 
density  function  from  a  set  of  observations,  rather  than  a  frequency 
distribution. 

Let  \*2  •  ^4  denote  the  second,  third  and  fourth  moments 

about  mean  of  some  distribution.  If  we  preassume  that  the  distribution 
should  belong  to  the  Pearson's  System,  then  the  criterion  k  ,  which  is 
defined  by 

(4.1)  k  -  $1(62+3)2[4(26  2-361-6)(4S2-361)3";L  , 


/ 


-29- 


IV-2 


where  (5^  and  S2  *re  obtained  ae  the  ratios  such  that 

(4.2)  ex  -  v\  Vj3 
and 

(4.3)  e2  -  uA  y"2  , 

takes  an  important  role.  Substituting  the  sample  moments  for  y2  •  V3  tod 
y^  in  (4.2)  and  (4.3),  and  through  (4.1)  we  can  evaluate  Pearson's 
criterion  k  ,  and,  according  to  its  value,  we  decide  which  type  of  the 
Pearson's  system  our  distribution  belongs  to.  If,  for  instance,  k  turned 
out  to  be  negative  and  finite,  then  the  distribution  will  be  of  Pearson vs 
Type  1;  if  it  turned  out  to  be  such  that  k  ■  0  ,  -  0  and  S2  <  3  , 

then  our  distribution  will  be  of  Pearson's  Type  II;  and  so  on. 

Figure  4-1-1  shows  the  set  of  five  hundred  maximum  likelihood 
estimates,  ,  which  was  introduced  in  Section  111.5  of  the  preceding 
chapter,  in  the  summarized  form  of  frequency  distribution.  In  the  same 
figure,  also  presented  by  a  dotted  line  is  the  theoretical  frequency  of  the 
maximum  likelihood  estimate  §  ,  which  was  obtained  from  (3.21),  using  the 
uniform  density  (cf.  Section  III. 3)  for  f(6)  and  n(6,o)  for  ^(g|6)  , 

It  turned  out  that  Pearson's  criterion  k  and  the  values  of  6^  and 
indicated  that  our  distribution  belongs  to  Type  II,  and  the  frequency 
function  obtained  by  the  method  of  moments  is  drawn  by  a  solid  line  in 
Figure  4-1-1, 


'  JTsqusncy  Distribution  of  thn  Tiv«  Bundrsd  HsxlmM  Lika  11  hood  £»tiastss  (HlatO|raa), 
Jsar son's  Typs  II  Frsqusncy  Junction  Pitted  by  tbs  Mstbcd  of  Kwsnts  (Solid  Curvs) 
and  tbs  Thsorstlcal  Frsqusncy  Junction  of  tbs  Msxiaun  Llksllbood  1st  last*.  I  . 


-30- 


IV-3 


In  contrast  to  this  result,  Figure  4-1-2  presents  similar 
results,  which  were  obtained  by  approximating  the  frequency  function  by 


S 


Praquancy  Distribution  of  tba  fit*  Hundrad  Maximum  Likallhood  Eatlmataa  (Hiatogram) , 
tba  Polynomial  Pittad  by  tba  Mathod  of  Momnta  (Solid  Cunra  and  tba  Thaoratlcal 
Praquancy  Junction  of  tba  Maximum  Llkallbood  Eatlmata,  §  .  Tba  Tbraa  Polynomials 
ara  of  Dasraaa  3  ,  4  and  5  ,  Kaapactlvaly . 


■31- 


IV-4 


che  polynomials  of  degrees  3,  4  and  5  using  the  method  of  moments, 
respectively.  Comparison  of  these  results  with  Figure  4-1-1  may  make 
us  prefer  polynomials  to  Pearson  type  frequency  functions,  because  of 
their  flexibilities  in  shape.  This  is  especially  obvious  when  we 
compare  the  Type  II  frequency  function  with  the  polynomial  of  degree  4, 
in  both  of  which  the  first  through  fourth  moments  were  fitted. 

Figure  4-1-3  illustrates  two  polynomials  fitted  by  the  method  of 
moments  to  each  of  the  two  sets  of  observations.  The  figure  belongs  to 
the  combination  of  the  Two-Parameter  Beta  Method  and  the  Curve  Fitting 
Approach,  Degree  3  Case,  which  will  be  introduced  in  the  following 
chapter.  2,500  observations  of  0  ,  which  were  produced  by  the  Monte 
Carlo  method,  were  classified  into  two  groups,  i.e.,  the  success  and  the 
failure  groups  for  an  unknown  binary  test  item,  item  4  .  These  two 
subsets  of  observations  are  shown  in  Figure  4-1-3  in  the  summarized  form 
of  frequency  distributions,  by  thick  and  thin  lines.  For  each  subset, 
polynomials  of  degrees  3  and  4  were  fitted  by  the  method  of  moments,  and 
are  shown  by  a  long,  dashed  line  and  a  dotted  line,  respectively. 


FIGURE  4-1-3 

telative  Frequencies  of  6  Shared  by  the  Succeee  (Thick  Line)  and  the  Failure  (Thine  tine) 
Croupe  and  the  Corresponding  Tolynoalala  of  Degree  3  (Long  Daahaa)  and  of  Dagrea  4  (Dot a) 
for  Xtan  4  .  Tvo-Faraaetar  hate  Method  and  Curve  Fitting  Approach,  Dagrea  3  Case. 


-32- 


IV-5 


(IV. 2)  Method  of  Moments  As  the  Least  Squares  Solution  for 
Fitting  a  Polynomial 

Let  h(t)  be  any  function  of  the  variable  t  ,  which  is  defined 
in  a  closed  interval,  [t,  t]  ,  and  is  integrable  in  the  Lebesgue  sense 
and  has  the  first  m  moments.  This  fvmction  h(t)  can  be  some 
specified  mathematical  function,  or  an  empirically  obtained  function. 
Let  (i-0,1,2, . . . ,m)  be  the  i.-th  coefficient  of  the  polynomial 

which  can  be  written  in  the  form 

a  i 

(4.4)  E  a.  tx  , 

i-0  x 


and  is  to  be  fitted  to  the  function  h(t)  following  the  least  squares 
principle.  We  define  Q  such  that 


(4.5)  2Q  - 

Differentiating 
zero,  we  obtain 


j; 


m 


i,2 


[h(t)  -la.  dt  . 

t  i-0 


Q  with  respect  to  ar  and  setting  the  result  equal  to 


(4.6) 


m 

[h(t)  -  l  a 
i-0  ' 


t^l-t*]  dt  -  0 


and  then 


(4.7) 


tr  h(t)  dt 


for  r-1,2, . . . ,m  . 


ft  r  ®  i 
tr  S  o.  t  dt  , 

J  t  i-0 


Thus  it  is  obvious  from  (4.6)  that  the  least  squares  principle 
requires  the  resultant  polynomial  of  degree  m  to  have  the  same  0-th 
through  m-th  moments  as  h(t)  ,  which  is  nothing  but  the  principle 
upon  which  the  method  of  moments  is  based.  From  this  result,  it  is 
obvious "that  both  methods  provide  us  with  the  same  polynomial. 


-33- 


IV-6 


When  the  function  h(t)  is  observed  only  at  N  points  of  the 
variable  t  ,  as  is  often  the  case  for  an  empirically  observed  function, 
ve  can  replace  (4.5)  by 

N  *  i  2 

(4.8)  2Q  -  I  {[h(t  )  -  Ho,  t.1]  w(t.)r  , 

k-1  K  i-0  1  K  K 

where  w(t^)  is  some  appropriately  chosen  weight  for  t^  . 
Differentiating  (4.5)  and  setting  the  result  equal  to  zero,  we  obtain 


m 


(4.9) 


»  r  N  r  “  i 

E  t.r  h(t.  )  w(t.  )  -  I  t  r  w(t  )  E  a  t1 

k*l  K  *■  *■  *  i«0  *■ 


If  the  function  h(t)  is  continuous  and  we  divide  the  interval  [t,  t] 
into  N  subintervals,  by  the  middle  value  theorem  there  exists  at  least 
one  value,  ,  in  each  subinterval  (t^,  t^)  which  satisfies 


(4.10) 


where 


J  ik 


h(t)  dt  -  c'r  h(t>r)(tk  -  tk)  , 


kr 


(4.ii)  t^-ik+i 


for  k  -  1,  2,  . ..,  (N-l) ,  and 


(4.12) 


f-1  "  - 

is- 1 


When  the  width  of  each  subinterval  is  small  enough,  these  (m+1)  values, 
Ckr  (r-0,1,2, . . . ,m)  ,  can  be  approximated  by  a  single  value,  say  the 
midpoint  of  the  sub interval.  Using  such  a  value  os  t^  and  the 
subinterval  width  as  wvt^)  ,  we  can  approximate  (4.2)  by  (4.6).  If  all 
the  sub interval  widths  are  equal,  (4.6)  is  simplified  to  provide 


.  N 

2  t 
k-1 


N  m  . 

1  h(V  -  A  he  A  °i  * 


k-1 


i-0 


(4.13) 


(IV. 3)  Direct  Use  of  the  Least  Squares  Solution, 
We  can  rewrite  (4. A)  in  the  form 
urt-1 


(4.14) 


I'  -  I  o  ,  tj+B-1]'1  -  t1*9'1]  , 

9  J-l  3_1 


where  s-r+1-1,2,. . .  ,«fi  ,  j»i+l«l,2,. . .  ,nrt-l  ,  and  y’  is  the 

s 

(s-l)-th  moment  of  t  about  the  origin,  defined  by 

(4.15)  y^  -  J  t8”1  h(t)  dt  . 


Let  a  be  i  column  vector  of  order  (ntfl) ,  whose  j-th  element  is 

and  y'  be  a  column  vector  of  the  same  order  whose  s-th  element  is  y^ 

Thus  we  can  rewrite  (4.11)  in  the  matrix  notation  to  obtain 


(4.16) 


y'  -  Aa  , 


where  A  is  a  symmetric  matrix  of  order  (mrfl)  whose  a  j -element  is 
given  by 

(4.17)  [j+8-ir1  [t^8-1  -  t^8”1)  . 


The  least  squares  solution  for  a  is  obtained,  therefore,  by 


(4.18) 


&  ■  A  ^y' 


For  the  purpose  of  illustration,  the  matrix  A  for  m  ■  2  is  shown 
below  as  an  example. 

*  (t  -  t)  (t2-  t2)/2  (t3-  t*) /s' 

(t2-  tz)/2  (t3-  t3)/3  (t4-  t4)/4 


(4.19) 


[  (t3-  t3)/3  (t4-  t4)/4  (t5-  t5)/5 


In  practice,  we  usually  use  a  greater  value  for  m  ,  and  obtaining  the 
inverse  matrix  of  A  will  be  the  most  intricate  process  of  computation 
and  the  availability  of  a  package  program  for  inversing  a  symmetric 
matrix  will  be  of  necessity. 


-35- 


IV-8 


(IV, A)  Solution  by  the  Method  of  Moments 

Let  R(t)  be  a  half  of  the  interval  width  for  which  the 
function  h(t)  is  defined,  and  M(t)  be  the  midpoint  of  the  interval, 
such  that 

(4.20)  R(t)  -  (t  -  t)/2 
and 

(4.21)  M(t)  -  (t  +  t)/2  . 

For  convenience,  we  define  a  new  variable  t*  by  changing  the  origin 
of  t  to  the  midpoint  of  the  interval  [t,  t]  ,  i.e., 

(4.22)  '  t*  -  t  -  M(t)  . 


Thus  the  polynomial  of  degree  m  in  t  can  be  rewritten  as  a 
polynomial  of  the  same  degree  in  t*  ,  or 


(4.23) 


“  i  ®  i 

I  c  ■  Z  a.  t*1  , 

i-0  1  i«0  1 


with  the  relationship  between  the  two  sets  of  coefficients  such  that 
(  *  a„  for  M(t)  »  0 


(4.24) 


/-‘c 

>C  l-  ”  (-l)1'1  (*)  [MCt)]1’1  . 


,  otherwise , 
r»0,l,2p. . .  ,m  . 


The  following  relationships  hold  between  the  momenta  about  the 
midpoint  M(t)  and  the  coefficients  a^  (r«l,2,.. .  ,m)  . 


(4.25) 

[m/2] 

V2g  *  2  klQ  *2k 

C(m-l)/2] 

(4.26) 

w2g+l  2  ** 

^  k-0 

ga0 ,1 ,2 , , . . ,[m/2j  , 

-1  [RU)]2^1^1 

g-0,1.2 . [<«-!) /2] 


In  the  above  two  equations,  [  3  indicates  the  integer  part  of  the 
number,  and  y^  and  V^g+l  in<^cate  ®ven  and  odd  momenta  about  the 
midpoint,  M(t)  ,  respectively. 

Let  p«g+l  and  q-k+1  .  We  define  the  following  two  symmetric 
matrices,  and  ,  whose  orders  are  both  (m+l)/2  when  m 

is  odd,  and  (o/2)+l  and  (m/2)  when  m  is  even,  respectively. 

(4.27)  B(q)  -  {  £R(t)]2(p+q)“3  [2(p+q)-33“1  >  . 

(4.28)  B(1)  -  {  DUt)]2^"0"1  [2(p+q)-l3”1  >  . 

Let  and  y*^  be  column  vectors  of  the  corresponding  orders, 

such  that 


(4.29) 


p-1,2, ... , [m/23+1  , 


(4.30  wfo  - *  »5p.i  y  . 


p-1,2, ... ,[(m+l)/2]  . 


Let  a^  and  denote  the  coefficient  vectors  of  the 

corresponding  orders,  which  can  be  written  as 


(4.31)  a(0)  ”  *  e2(q-l)  * 


q-1,2,. . . ,[m/2]+l  , 


(4.32)  a(i)  "  *  a2q-l  ^  * 


q-l,2,...,[(itttl)/23  . 


Thus  we  can  rewrite  (4.25)  and  (4.26)  in  the  matrix  notation  such  that 


(4.33)  «{0)  -  2»(0)»(0) 


-r37-r 


IV-10 


(4.34)  Vfo  -  2®(i)a(i)  . 

The  coefficient  matrices  a^  and  a^  are  obtained,  therefore,  by 

(4.35)  *(0>  *  <1«>  B(0)|J(!0) 

and 

(4.36)  •())  ■  ««)  Vi)  Id  • 

In  practice,  the  computation  is  facilitated  if  we  define  two  matrices, 
and  ,  of  orders  [m/23+1  and  [(m+l)/23  ,  respectively, 

such  that 

(4.37)  C(Q)  -  {  [2(pfq)-33"1  ) 

and 

(4.38)  C(1)  -  {  C2(p+q)-l3’'1  }  , 


the  two  matrices,  (1/2)  c^g) 

_ _  ..  i _ .  /  t  ln\  vi  ™1  _ J  /I 


seen  that  (1/2)  B 


which  do  not  depend  on  a  specific  set  of  data  but  depend  only  upon 
the  degree  of  the  polynomial.  From  these  two  matrices,  we  can  obtain 

and  (1/2)  ,  and  it  is  easily 

^  and  (1/2)  are  obtained  by  dividing  the 

element  in  the  p-th  row  and  q-th  column  of  the  corresponding  matrices 
by  [RCO]2(p+q)“3  and  [R(t)32(p+q)_:L  ,  repectively,  for  every 
combination  of  p  and  q  .  The  resultant  sets  of  equations  for 
obtaining  the  coefficients  a^  are  listed  below  for  the  polynomials 
of  degrees  3,  4,  5,  6  and  7. 


(i)  Polynomial  of  Dagrtt  3^ 


'  »0  “  [l.l25yg/R]  -  [1.375UJ/R*] 

*1  "  (9*375Wf/R']  -  U3.125u|/R‘] 

•2  -  [-1.875y«/R*]  ♦  [5.625 yg/R*] 

•3  -  [-13.125y»/Rs]  +  [21.875w|/R7] 


(ii)  Polynomial  of  Pagrat 

{  *0a  [1.7578125W*/R]  -  [8.203125g5/R‘]  ♦  [7.3826125^^*] 

•x  -  I9.375w;/R,3  -  U3.125y$/R‘] 

0)  <  *2  “  [-8.203125U*/R*}  ♦  [68.90625y«/R*J  -  [73.826l25y];/R7) 

*3  -  [-13.125M*/RS]  +  [21.875U§/R?] 
v  *4  -  [T.3828125u*/R*J  -  [73.828l25y!|/R7]  ♦  [86.1328l25wj;/R* ] 


(ill)  Polynomial  of  Da^raa  5 

f  «0  “  [1.7578125yg/»]  -  [8.203125y$/R*]  +  [7.382B125y*/R*] 

•x  -  [2B.7109375y*/R*]  -  [103.359375y*/R*]  +  [B1.2109375y*/R7] 
*2  -  [-8.203125yg/R']  +  l  68.90625yj/R*]  -  [73. 82B125y*/R7] 
l,i;  {  «3  -  [-103.359375UJ/R1]  +  [442.96875y$/R7]  -  [37e.9e4373y$/R*] 

•4  -  [7.3828125yg/R‘]  -  [73.828125y|/R7]  +  [86.132B125yJ/R*] 

«5  »  [»1.2109375y|/R7]  -  1378.984375^/8* ]  +  [341.0859375p|/R"l 


(iv) 


(A. 42) 


Polynomial  of  Dagraa  6_ 

(  aQ  *  [2 . 39257E1U*/R]  -  [2 X . 533203XWJ/K * 3  +  [47.3730469y*/R*] 

-  [29.326i719yg/R7] 

ax  -  U8.7109375u*/R‘J  -  [103.359375y*/R*J  +  [81.2109375y*/*7] 

a2  -  [-21.5332031y*/R*]  +  [348.8378906  $/R*] 

-  [913,6230469yJ/R7]  +  [615,8496094yg/R*] 

*3  -  1-103. 359375UJ/R*)  +  [442.96875y*/RrJ  -  [378.984375y|/S* J 

a4  -  [<*7.3730469U*/R5]  -  [913,6230469y*/R7] 

+  12605. 5175781y|/R*]  -  [1847.5488281yg/Rn] 

•5  -  1 81  •  1109375y|/R7 J  -  [378.984375y*/R*]  +  [341.0859375|tJ/R»] 

a6  *  [-29.3261719y*/R7]  +  [615.8496094y|/R* J 

-  [1847.5488281y$/Rn]  +  [  1354. 8691406 y*/R*  *] 


V 


-39- 


IV-12 


(v)  folyncmlal  of  Batr**  T_ 

[2.392578Hi*/R]  -  [21.5332031pj/I*J  4  [47% 3730469^^/**] 

-  [29.326171*yjj/*7] 

[ 64 . 5996094pJ/R* ]  -  [ 426. 35742 WyJJ/R1] 

4  I791.8066406p|/*7]  -  [439.8925781pJ/R*) 

I-21.5332031p*/*»]  4  [348.8378906  $/**] 

-  I913,6230469u}/R7)  4  [615,6496094pJ/R») 
[-426.3574219p|/R*J  4  [3349.9511719y$/R7) 

-  [6774 . 3457031p£/R* J  +  [3959.0332031^/** l) 

[47.3730469yg/**]  -  [9i3.6230469u$/R7) 

♦  [2605.5175781pJ/**J  -  I1847.5*88281uj/*U] 
[791.8066*06y*/*7]  -  [6774. 345703lp$/R*) 

+  [14410. 8808594p|/*“]  -  [6709.873M69li$/R* ’] 

t-29.3261719pg/*7]  4  [615. 6496094^/1*1 

-  [1847.S486281P*/**1]  4  [135*.8691406ug/R“] 
[-439.6925781UJ/**]  4  [3959.0332031u$/*nl 

-  [8709.8730469PJ/R*']  4  [5391.8261719p}/R* *1 

(ror  aiapliclty ,  in  th*  above  aquation*,  *  1*  uaad  Instead  of  *{t) .) 

From  the  values  of  a^'s  thus  obtained  and  the  midpoint 
M(t)  ,  ve  can  find  out  the  values  of  the  coefficients,  o^’s  , 
by  means  of  (4.24). 

(IV. 5)  Expanded  Use  of  the  Method  of  Moments 

As  we  have  observed  in  the  preceding  sections,  the  method  of 
moments  for  fitting  a  polynomial  can  be  considered  another  procedure 
for  the  least  squares  solution.  It  has  a  definite  advantage  over 
the  direct  least  squares  solution,  since  the  computation  of  the 
coefficients,  (i«0,l,2, , . . ,m)  ,  can  be  done  by  the  application 

of  straight-forward  algebra,  while  the  direct  procedure  for  the  least 
squares  solution  involves  the  Inversion  of  the  matrix  A  . 

This  fact  implies  that  we  can  adopt  the  method  of  moments  for 
fitting  a  polynomial  for  the  approximation  to  any  target  function, 
which  is  not  necessarily  a  density  function  or  a  frequency 
distribution.  In  fact,  in  the  present  study,  we  used  the  method  for 
approximating  the  square  root  of  the  test  information  function  of  the 


(4.43) 


*0* 


*1* 


*2 


*3“ 


*6  4 


*7  - 


-40- 


IV-13 


Old  Test,  among  others,  which  facilitated  the  transformation  of 
ability  0  to  t  .  The  rationale  behind  this  method  will  be 
described  in  the  following  chapter. 


(IV. 6)  Selection  of  the  Interval 

When  we  fit  a  polynomial  to  a  frequency  distribution  or  a 
set  of  observations,  the  selection  of  the  interval  is  more  or  less 
automatical.  When  we  use  the  method  of  moments  for  fitting  a 
polynomial  to  a  function  other  than  those,  however,  the  goodness  of 
fit  of  the  polynomial  to  the  target  function  depends  largely  upon 
our  selection  of  the  interval. 

Figure  4-6-1  illustrates  such  a  situation.  In  this  figure, 
the  square  root  of  the  test  information  function,  [l(8)]  ,  of 

Subtest  1  is  drawn  by  a  solid  line.  The  other  two  dashed  and 
dotted  curves  are  the  polynomials  of  degree  7  obtained  by  the 
method  of  moments,  using  the  intervals  of  fi  ,  [-3.0,  3. 0]  and 


LATENT  TRAIT  9 

FIGURE  4-6-1 


Squara  loot  of  tha  Tut  Iaforactloc  Tun  it  loo  of  lubteat  X  , 
1/2 

[1(8)1  '  ,  (Solid  tin*)  and  the  folyooMlala  of  Dagraa  7  , 

Which  wara  llttad  by  tha  Mathod  of  Hoaanta  with  [-3.0,  3.0] 
(Dathaa)  and  [-1.0,  4.0]  (Data)  *a  tha  latarval  of  6  , 
taa  inactively. 


3 

3 

1 

1 

I 

I 

1 

I 

I 

I 


-41- 


IV-14 


[-4.0,  4.0]  ,  respectively.  We  can  see  that  the  latter  polynomial 
fits  ouch  better  than  the  former  to  the  target  function.  This 
implies  that,  although  the  interval  of  ability  @  of  our  interest 
is  even  a  little  smaller  than  [-3.0,  3.0]  ,  in  order  to  obtain  a 
polynomial  which  fits  to  the  target  function  in  this  interval,  we 
must  use  a  larger  interval  such  as  [-4.0,  4.0]  . 

We  cannot  generalize  this  result  too  much,  however.  Figure 
4-6-2  presents  a  similar  set  of  curves  for  Subtest  2  .  It  is  noted 
that,  while  the  fit  is  better  for  the  polynomial  obtained  by 
using  the  interval,  [-4.0,  4.0]  ,  than  the  one  obtained  by 
using  the  interval,  [-3.0,  3,0]  ,  in  the  former  situation  there 
still  is  a  substantial  discrepancy  form  the  target  function. 


LATENT  TRAIT  9 

FIGURE  4-6-2 

Square  toot  of  the  Toot  reformation  Function  of  Subteet  2  , 

[I(6)]1/2  ,  (Solid  Liao)  and  tho  rolynomlela  of  Degree  7  , 
Which  were  Fitted  by  the  Method  of  Moments  with  [-3.0,  3.0} 
(Deehen)  end  [-4.0,  4.0]  (Dote)  ea  the  Interval  of  6  , 
tee  pa  entirely. 


Figure  4-6-3  presents  the  result  obtained  by  using  the  three 
subintervals  of  [-4.0,  4,o3  ,  with  6  ■  -1.5  and  0.5  as  the 
cutting  points.  These  three  polynomials  are  uniformly  of  degree 


-42- 


IV- 15 


4  .  We  can  see  that,  together,  they  fit  very  well  to  the  target 
function.  This  is  another  way  of  using  the  method  of  moments. 


LATENT  TRAIT  0 

FIGURE  4-6-3 


Square  loot  o  t  tfc*  Tost  Inf  one  t  Ion  Function  of  Subtoot 

2  ,  IK*))1'*2  ,  (Solid  Lino)  and  tho  Three  Polynomial* 

of  Dtgrea  A  (Dot*) ,  Which  Vera  Fitted  by  the  Hethod  of 
Moment*  Doing  the  Three  Subinterval*  of  a  . 

The  use  of  aubintervals  may  be  effective  when  we  apply  the 
method  of  moments  for  fitting  polynomials  to  relatively  smooth 
mathematical  functions.  The  same  is  not  necessarily  true,  however, 
if  we  use  the  method  of  moments  for  empirical  data.  Figure  4-6-4 
illustrates  such  examples.  Our  data  are  again  the  set  of  five 
hundred  maximum  likelihood  estimates  6fl  ,  and,  in  the  first  graph, 
it  was  reclassified  into  the  lower  and  upper  subsets  of  250 
observations  each,  and,  in  the  second  graph,  in  a  similar  manner, 
it  was  divided  into  five  subsets  of  100  observations  each.  The 
polynomials  shown  in  these  two  graphs  sre  uniformly  of  degree  4  . 
We  can  see  that  neither  result  is  appropriate  for  us  to  use  as  the 
estimated  density  function,  g(@)  . 

To  conclude,  the  selection  of  the  interval  or  intervals  is 
very  important  in  order  to  use  the  method  of  moments  for  fitting 


a  polynomial  or  polynomials  successfully,  and  we  must  make  a  good 
judgment  in  each  situation  considering  the  expected  shape  of  the 
target  function,  and  the  nature  of  our  data. 


i 


Polynomial  Approximation*  of  tha  Dana^tv  function  g(6)  of  tha  Sat  of  flv* 

Hundred  Maximum  Likelihood  Estimate*  6,  Obtained  upon  tha  Original  Old  Taat 
by  tha  Mathod  of  Momenta,  by  Dividing  tha  Total  Sat  Into  Two  Subnet*  (Left) 
and  Into  Viva  Subaata  (Eight). 

There  are  many  examples  other  than  those  illustrated  here, 
and  it  is  recommended  that  the  reader  refers  to  the  research  report, 
RR-79-2,  and  many  others  such  as  RK-78-1,  RR-80-2  and  RR-80-4 . 

(IV. 7)  Comparison  of  the  Results  Obtained  by  the  Method  of  Moments 
and  by  the  Direct  Least  Squares  Procedure 

Comparison  of  the  polynomials  obtained  by  the  method  of 
moments  and  by  the  direct  least  squares  method  was  made  by  using 
the  standard  normal  distribution  function  as  the  target  function 
(cf.  RR-79-2) .  It  was  made  by  changing  the  interval  of  6  for  which 
these  methods  are  applied,  and,  as  is  expected,  in  most  cases  the 
resultant  two  polynomials  are  identical. 

There  are  somewhat  different  results,  however.  Figure  4-7-1 
presents  such  an  example.  In  this  figure,  the  resultant  polynomial 
obtained  by  the  method  of  moments  is  plotted  by  dots,  and  the  one 
obtained  by  the  direct  least  squares  method  is  shown  by  short  dashes. 
In  both  cases  the  Interval  of  9  ,  [-6.0,  6.0]  ,  was  adopted.  It 


DISTRIBUTION  FUNCTION 


-44- 


IV -17 


FIGURE  4-7-1 


Polynomial*  of  D«gr««  7  Obtaiaad  by  the  Method  of  Momuits  (Sots)  and  tha  Least  Squares  Solution 
(Short  Dashas  )  ,  with  tha  Interval,  (-6.0,  6.0]  ,  sad  the  Taylor '•  Series  (Long  Dashes), 
Approximating  the  Standard  Noraal  Distribution  Function  (Solid  Line) .  Those  Obtained  by  the 
First  Two  Methods  Using  tha  Interval,  [-3.0,  3.0]  ,  Are  Also  Plotted  (Crosses). 


Is  noted  that,  while  the  result  obtained  by  the  method  of  moments  fits 
to  the  target  function  reasonable  well  for  the  total  interval  of  6  , 
the  one  obtained  by  the  direct  least  squares  solution  diverts,  quickly, 
from  the  target  function  outside  the  interval,  (-2.0,  2.0)  .  This 
diversion  comes  from  the  limitation  of  the  capacity  of  the  computer 
in  inverting  the  matrix  A  .  This  example  also  suggests,  therefore, 
that  it  la  wise  for  us  to  use  the  method  of  moments  instead  of  the 
direct  least  squares  method.  In  the  same  figure,  the  corresponding 
two  polynomials  obtained  by  using  the  interval  of  6  ,  [-3.0,  3.0]  , 

are  also  plotted.  Since  they  are  identical,  they  are  drawn  together 
by  crosses,  and  only  for  the  interval  where  the  curves  divert  from 
the  target  function. 

We  recall  that  there  is  another  type  of  polynomials  which  are 


IV-18 


obtained  by  Trior's  series.  Using  Hermite  polynomials  (Kendall  and 
Stuart ,  1963)  ,  we  can  write  for  the  Taylor's  series  for  the  standard 
normal  distribution  function  such  that 


(4.44) 


N(0,1)  *  0.500000  +  0.398942  0  -  0.0664903  63  +  C. 00997355  6s 

-  0.00118732  67  +  0.000115434 09  - 

-  0.00000944465  61 1  +  ... 


The  resultant  polynomial  of(  degree  7  is  drawn  by  longer  dashes  in 
Figure  4-7-1.  It  1b  noted  that  the  fit  of  this  polynomial  to  the 
target  function  is  better  for  the  interval  of  0  ,  (-1,7,  1.7)  , 
but  outside  of  this  interval  it  diverts  from  the  target  function 
quickly.  This  is  a  common  tendency  over  the  resultB  of  different 
degrees  of  polynomials  (cf.  RR-79-2) . 


References 


[l]  Elderton,  W.  P.  and  N.  L.  Johnson.  Systems  of 
Cambridge  University  Press,  1969. 


curves . 


[2]  Kendall,  M.  G.  &  Stuart,  A.  The  advanced  theory  of  statistics 
(Vol,  1),  New  York:  Hafner,  1963. 


-46- 


V-l 


V  Estimation  of  the  Operating  Characteristics  of  the  Discrete  Item 
Responses  and  That  of  Ability  Distributions;  II 

In  the  present  chapter,  following  Chapter  3,  we  shall 
continue  integrating  the  rationale  and  findings  of  this  part  of  the 
research.  Throughout  this  process,  the  method  of  moments  will 
frequently  be  used,  especially,  for  fitting  polynomials.  The 
reasons  for  the  choice  of  the  polynomial  in  preference  to  the  other 
functions  were  described  in  the  preceding  chapter.  Among  others, 
it  provides  us  with  the  least  squares  solution. 

(V . 1}  Estimated  Operating  Characteristics  VThich  Are  Directly 

Observable  from  Our  Calibration  Data 

Since  our  data  are  simulated  data,  the  proportion  correct 
for  each  of  the  ten  unknown,  binary  items  (cf.  Section  III. 6)  is 
directly  observable.  Figure  5-1-1  illustrates  two  sets  of  the 
proportion  correct  for  item  6,  by  solid  and  dashed  lines, 
respectively,  together  with  the  theoretical  item  characteristic 
function.  The  subinterval  widths  used  for  these  two  curves  are  0.05 


FIGURE  5-1-1 


Proportion  Correct  for  Item  6  Using  the  Subinterval  Width  0.05  (Solid  Lins) 
and  0.25  (Sashed  Lins),  and  ths  Similar  JUault  Obtained  by  Using  ths  Maximum 
Likelihood  Estimate  6  Instead  of  Ability  6  and  ths  Sublntarval  Width  0.25 
(Dottad  Lins),  Together  with  the  Item  Characteristic  Function  (Solid  Curve). 


and  0.25  ,  respectively.  Thus,  in  the  first  case,  five  hypothetical 
examinees  sharing  the  same  position  (cf .  Section  III. 3)  makes  the 
total  frequency  for  each  subinterval  of  0  ,  aud,  in  the  second  case, 
twenty-five  examinees  sharing  five  adjacent  positions  stakes  the 
total  frequency.  We  can  see  from  these  results  that  they  are  by  no 
means  good  approximations  to  the  theoretical  item  characteristic 
function,  because  of  their  large  fluctuations.  The  reason  is  that 
we  have  only  five  hundred  hypothetical  examinees  in  our  calibration 
data. 

(  It  should  be  noted  that  these  two  curves  in  Figure  5-1-1  are 
not  observable,  if  our  calibration  data  are  empirical  data.  In 
practice,  the  closest  we  can  get  from  our  empirical  data  is, 
therefore,  the  proportion  correct  based  upon  the  maximum  likelihood 
estimate  §  ,  instead  of  ability  6  Itself.  Tills  third  proportion 
correct  for  item  6  is  also  plotted  in  Figure  5-1-1  by  a  dotted  line, 
using  the  set  of  five  hundred  maximum  likelihood  estimates  obtained 
upon  our  original  Old  Test  (cf.  Section  III. 3).  The  subinterval 
width  for  this  proportion  correct  is  0.25  ,  as  was  the  case  with 
the  second  curve  based  upon  ability  0  .  Again,  we  can  see  that 
the  fluctuations  from  the  true  item  characteristic  function  are 
large. 

As  was  poin  ed  out  in  Section  III. 7,  the  use  of  Indirect 
information  obtainable  from  our  calibration  data  will  ameliorate 
the  situation.  If  our  results  provide  us  with  better  approximations 
to  the  theoretical  item  characteristic  function  than  those  three 
curves  do,  therefore,  we  shall  content  ourselves  by  deciding  that 
our  methods  are  successful. 

(V . 2 )  Necessary  Correction  for  the  Scale  of  the  Maximum  Likelihood 

Estimate  When  Used  As  £  Substitute  for  Ability  Iscale 

It  is  commonly  taken  for  granted  that,  whenever  the  scale 
of  the  maximum  likelihood  estimate  is  available,  it  can  directly  be 
used  as  the  substitute  for  the  ability  scale.  The  reader  may 
wonder,  therefore,  why  we  need  an  elaborated  process  of  estimating 


-48- 


V-3 


the  operating  character istics  when  the  set  of  maximum  likelihood 
estimates  of  ability  is  available.  If  our  calibration  data  contain 
only  several  hundred  examinees,  because  of  the  sampling  fluctuations 
they  cannot  provide  us  with  a  good  approximation  to  the  theoretical 
operating  characteristics,  as  we  have  seen  in  the  example  given  in 
the  preceding  section. 

Our  next  question  will  be:  Is  it  justifiable  to  use  the 
scale  of  maximum  likelihood  estimate  and  its  proportion  correct  for 
the  estimated  item  characteristic  function  when  we  have  a  large  set 
of  calibration  data,  like  those  based  upon  twenty  thousand  examinees? 
The  answer  still  must  be  "No,"  or  "Not  without  some  modification." 

Let  us  assume  that  our  Old  Test  provides  us  with  the 
approximate  unbiasedness  of  the  maxlmun  likelihood  estimate  6  , 
and  the  normality  for  its  conditional  distribution,  given  ability 
0  ,  for  the  interval  of  0  ,  (0,0)  ,  of  our  interest.  Thus  we 

can  write 

(5.1)  E(@ | 0)  -  0  . 

From  (5.1),  we  obtain  for  the  expectation  of  0  such  that 

(l 

(5.2)  E(@)  -  E(0|6)  £(0)  d0  -  E(6)  . 

/  s 

By  virtue  of  the  binomial  law,  we  have,  from  (5.2),  for  the  m-th 
moment  of  @  about  the  mean 

(5.3)  E[@-E(@)]m  -  S  E[{0-E(0))m"r  E{(6-0)r|0}]  . 

r-0  'r/ 

From  (5.3),  we  can  write  for  the  specific  cases  where  m  *  2,  3  and 

4  , 

(5.4)  Var.(@)  ■  Var.(8)  +  E[Var.(@j6)]  , 

(5.5)  E[ (@-E(@) } 3 ]  -  E[{0-E(6)}3]  +  3Et{0-E(6)}  Var.(§|0)] 


-49- 


V-4 


and 

(5.6)  E[{&-E(0)}4]  -  E[{6-E(0)}4]  +  6Et{0-E(e)}2{E{Var.(§|6)}] 

+  E[(§-0)4|0) 

The  above  results  Imply  that  the  distribution  of  the  maximum 
likelihood  estimate  8  is  different  from  that  of  ability  6  ,  and, 
above  all,  it  has  a  larger  variance.  Since  the  proportion  correct 
is  the  ratio  of  two  such  distributions,  these  results  indicate  that 
it  contains  a  bias  in  itself. 

The  correction  for  this  distortion  can  be  made  in  the 

following  way.  Let  us  assume,  tentatively,  that  the  square  root  of 

the  test  information  function  of  our  Old  Test  is  approximately 

constant  for  the  Interval,  (6,6)  ,  as  is  the  case  with  our  original 

Old  Test  (cf.  Section  III. 4).  Then  the  conditional  distribution  of 

6  ,  given  6  ,  Is  approximately  N(0,o)  ,  where  o  is  the 

reciprocal  of  the  constant  square  root  of  the  test  information 
-1/2 

function,  [1(6)1  •  Under  this  condition,  the  formulas  (5.4) 

through  (5.6)  can  be  simplified  to  provide  us  with 

(5.7)  Var.(§)  -  Var.(6)  +  a2  , 

(5.8)  E[{6-E(@)}3]  -  E[{8-E(0) }s) 
and 

(5.9)  E[{§-E(§)}4]  -  E[ {0-E(e) )4]  +  6o2  Var . (8)  4-  304  . 

Thus  the  distribution  of  $  has  the  same  mean  and  the  third  moment 
about  mean  as  that  of  6  . 

The  regression  of  ability  8  on  the  maximum  likelihood 
estimate  8  is  given  in  Chapter  3  as  (3.22)  .  To  reproduce  it, 
we  have 

(5.10)  E(0|Q)  -  §  +  o2  ^  log  g(§)  , 


where 


(5.11) 


g(6) 


^<Q |6)  f (6)  d0  . 


Mote  that  the  regression,  which  Is  given  by  (5.10),  Is  not 

A 

necessarily  linear,  although  that  of  the  maximum  likelihood  6  on 
ability  6  Is.  (5.10)  can  be  evaluated  if  we  approximate  the 
density  function,  g(§)  ,  by,  say,  a  polynomial  obtained  by  the 
method  of  moments..  We  can  shift  the  value  of  6  ,  therefore,  to 
E(0|§)  ,  so  that  we  make  the  proportion  correct  for  a  specific 
value  of  0  the  function  of  the  corresponding  value  of  E(6  j  G)  . 


When  the  square  root  of  the  test  information  function  of  our 
Old  Test  is  not  constant,  as  is  the  case  with  each  of  the  nine 
subtests  of  our  original  Old  Test,  ve  cannot  directly  apply  the 
above  method.  In  such  a  case,  we  must  transform  0  to  T  ,  follow 
the  whole  process  by  using  T  instead  of  6  ,  and  then  retransform 
T  to  6  .  The  rationale  behind  this  transformation  Is  given  in 
Section  1X1.8  ,  and  its  actual  procedure,  using  the  approximation 
to  the  square  root  of  the  test  information  function  by  a  polynomial 
obtained  by  the  method  of  moments,  will  be 'given  in  the  following 
section. 


The  observations  made  in  this  section  have  nothing  to  do 
with  our  methods  and  approaches  for  estimating' ability  distributions 
and  the  operating  characteristics  of  discrete  item  responses, 
however.  In  the  present  study,  either  the  conditional  distribution 
of  ability  0  ,  given  its  maximum  likelihood  estimate  §  ,  or  the 
bivariate  distribution  of  0  and  §  is  approximated  from  our 
calibration  data.  This  does  not  include,  therefore,  the  direct 
frequency  ratios  of  the  maximum  likelihood  estimate,  @  . 


(V.3)  Transformation  of  0  to  T  Using  the  Method  of  Moments 
for  Fitting  a  Polynomial 

The  rationale  behind  the  transformation  of  0  to  t  is 
given  in  Section  III. 8  .  This  process  will  be  simplified  if  we 


-51- 


V-6 


use  the  approximation  to  the  square  root  of  the  test  information 
function  01  our  Old  Test  by  a  polynomial  fitted  by  the  method  of 
moments.  In  so  doings  the  right  selection  of  the  interval  of  6 
for  which  the  method  of  moments  is  applied  is  very  important,  as 
was  explained  in  Section  IV . 6  , . 

We  can  write 

(5.12)  [I<e)]1/2i  z  CL  ek  ,  • 

k-0  K 

where  m  is  the  degree  of  the  polynomial  we  wish  to  obtain. 
Substituting  (5.12)  into  (3.40),  ve  obtain 


(5.13) 


,-l 


®  —1  k+1 

z  a.  (k+i)  1  e**1  +  d 

k-0  K 


m+1 

-  z 

k-0 


t 


where 


(5.14) 


-  d 

'  •  w1  Vi 


k-0 

k  -  1,2, . . . ,m+l  . 


Thus  the  transformation  of  6  to  x  can  be  made  through  another 

polynomial  of  degree  (m+1)  .  Considering  that  (3.40)  includes  a 

1/2 

tedious  numerical  process  of  Integrating  [1(6))  ,  the  straight 

forward  method  given  by  (5.13)  and  (5.14)  will  save  us  a  substantial 
amount  of  time  and  labor. 

Figure  5-3-1  presents  the  transformation  of  8  to  t 
obtained  by  this  method,  for  Subtests  1  and  2,  to  represent  those 
for  the  nine  subteats.  In  all  of  these  nine  cases,  the  interval, 
[-4.0,  4.0)  ,  was  used  in  applying  the  method  of  moments. 

Figure  5-3-2  presents  the  resultant  square  root  of  the  test 

1/2 

information  function,  [1*00 1  ,  for  Subtests  1  and  2.  As  is 


»•  Mwnvuiani 


. - 


— V- 


IHtH" 


•ZJO  -SLO  -to  OO  to  20  10  4J0 


-CO  -10  -10  -tO  00  U)  2.0  10  CO 


FIGURE  5-3-1 


Tran* format ion  of  0  to  t  for  Subtaata  1  and  2  . 


Subtest  1 


£  3j0 

M 


Subtest  2 


•4.0  -10  -10  -10  00  10  20  10  4.0 

T 


-4.0  -10  -20  -10  00  lO  2.0  10  4.0 

T 


FIGURE  5-3-2 

Squara  Root  of  T«at  Information  function,  '*  ,  and  tha  Target 

Con* taut  C  for  Subtast*  1  and  2  . 


expected  from  the  different  degrees  of  fitness  of  the  polynomials 
to  the  respective  [1(0)  's  in  these  two  cases,  which  are  shown 
in  Figures  4-6-1  and  4-6-2  of  Section  IV. 6,  respectively,  the 


-  .t  Jo-V.’iL 


-53- 


V-8 


1/2 

resultant  for  Subtest  1  Is  closer  to  the  target 

constant  than  the  one  for  Subtest  2.  For  all  the  other  seven 
subtests,  the  result  Is  similar  to  either  one  of  these  two  results, 
or  their  fitness  is  somewhere  between  the  two. 

(V.4)  Classification  of  Methods  and  Approaches 

Various  methods  and  approaches  for  estimating  the  operating 
characteristics  of  discrete  item  responses,  and  for  estimating 
ability  distributions,  were  developed  in  the  present  study.  For 
convenience,  by  a  method  ve  mean  a  way  of  approximating  the 
conditional  density  function  of  ability  6  or  t  ,  given  its  maximum 

A 

likelihood  estimate  0  or  t  ,  and  by  an  approach  we  mean  a  way  ov 
producing  the  ability  distributions  of  separate  discrete  response 
groups,  and  hence  the  operating  characteristics  (cf.  Section  II1.1). 
They  are  summarized  as  follows. 

(A)  Methods 

(1)  Pearson  System  Method 

(ii)  Two-Parameter  Beta  Method 
(Hi)  Normal  Approach  Method 

(B)  Approaches 

(i)  Bivariate  P.D.F.  Approach 
(ii)  Histogram  Ratio  Approach 

(iii)  Curve  Fitting  Approach 

(iv)  Conditional  P.D.F.  Approach 

(a)  Simple  Sum  Procedure 

(b)  Weighted  Stun  Procedure 

(c)  Proportioned  Sum  Procedure 

Prior  to  the  present  study,  the  author  had  developed  a 
method  (Samejima,  1977)  of  estimating  the  operating  characteristics 
of  discrete  item  responses,  which,  later,  was  called  Normal 
Approximation  Method.  With  the  classification  given  above,  this 
method  belongs  to  the  Bivariate  P.D.F.  Approach.  Although  it  had 


-54- 


V-9 


been  developed  before  the  author  started  the  present  study,  a  brief 
description  of  this  approach  will  be  given  in  Section  V.5,  so  that 
the  reader  will  understand  its  characteristics  and  the  differences 
from  the  combinations  of  a  method  and  an  approach,  which  are  the 
main  products  of  the  present  study. 

(V.5)  Normal  Approximation  Method 

Let  h(0)  be  a  linear  function  of  6  ,  which  minimizes  the 
quantity  Q  ,  such  that 

(5.15)  Q  -  E[6-h<§)]2  . 

We  obtain 

(5.16)  h(§)  -  Cov.(e,0)  [Var.(e))”1  [O-E(0))  +  E(6)  , 

where  Cov.(0,§)  denotes  the  covariance  of  ability  6  and  its 
maximum  likelihood  estimate  6  . 

When  the  square  root  of  the  test  information  function  of 
our  Old  Test  is  approximately  constant  for  the  interval  of  6  of 
our  interest,  as  is  the  case  with  our  original  Old  Test,  we  can 
write  from  (5.7) 

(5.17)  Cov.(6,6)  «  Var.(e)  Var.(0)  -  a2  . 

Substituting  (5.17)  into  (5.16)  and  rearranging,  we  obtain 

(5.18)  h(0)  -  [l-o2{Var . (0) J”1] [ 0-E(0) )  +  E(0) 

-  [l~o2  Var.(0)r1]§  4-  o2[Var.(0))_1  E(0) 

+  o  . 

From  this  result,  it  is  obvious  that  the  two  coefficients,  a  and 
0  ,  can  be  estimated  from  the  set  of  maximum  likelihood  estimates. 
When  the  joint  distribution  of  0  and  0  is  normal,  this  function, 
h(§)  ,  becomes  the  regression  of  0  on  §  .  In  such  a  case,  the 
conditional  distribution  of  6  ,  given  §  ,  is  normal,  with 


-55- 


V-10 


the  common  conditional  variance  such  that 
(5.19)  Var.(6|§)  -  o2tl-a2(Var . (6) T1)  . 

In  the  Normal  Approximation  Method,  a  bivariate  normal 

A 

distribution  is  assumed  for  the  joint  distribution  of  6  and  6 
for  each  subpopulation  of  examinees  who  share  the  same  discrete 
item  response  to  an  unknown  test  item.  With  our  calibration  data, 
there  are  two  groups  of  examinees,  i.e.,  the  success  and  failure 
groups,  for  each  of  the  ten  binary  test  items  (cf.  Section  111.6). 

For  each  of  the  five  hundred  maximum  likelihood  estimates, 

3  ,  using  the  Monte  Carlo  method,  a  single  value  of  h  vns 

s  ^ 

calibrated.  Let  6  denote  this  calibrated  value  of  6  .  Then  we 
have  two  subtests  of  6  ,  for  the  success  and  failure  groups  of 
item  h  ,  respectively.  The  ratio  of  the  frequency  distribution  of 
the  success  group  to  the  sum  of  the  two  frequency  distributions 
makes  the  estimated  item  characteristic  function  of  item  h  . 

Figure  5-5-1  presents  by  hollow  circles  the  estimated  item 
characteristic  function  of  item  6,  thus  obtained  by  using  0.25  as 
the  subinterval  width  of  frequency  distributions  of  0  .  In  the 
same  figure,  also  presented  by  solid  triangles  and  hollow  squares 
are  the  estimated  item  characteristic  functions  obtained  by 
producing  five  and  ten  6  ’s  for  each  of  the  five  hundred  maximum 
likelihood  estimates,  6  ,  respectively,  in  order  to  Increase  the 

accuracy  of  estimation.  We  can  see  that  even  with  the  five  hundred 
§'s,  the  estimated  item  ctr .ac ter is tic  function  is  fairly  close 
to  the  theoretical  item  characteristic  function,  and  it  becomes 
closer  when  we  increase  the  number  of  6  's  to  2,500  and  to 
5,000. 

When  our  Old  Test  does  not  have  a  constant  square  root  of 
the  test  information  function  for  the  interval  of  6  of  our 
Interest,  as  is  the  case  with  the  nine  subtests  of  the  original 
Old  Test,  we  can  transform  0  to  t  and  follow  the  same  process. 
To  obtain  the  estimated  operating  characteristics,  we  esn 


-56- 


V- 


LATENT  TRAIT  6 


FIGURE  5-5-1 

Eat  lasted  Itm  Characteristic  Factions  of  Itaa  6  Eaaad  upon  500  6  'a 

(Hollow  Circlaa),  upon  2,500  8  'a  (Solid  Trlanglaa)  and  upon  5,000 
6  'a  (Hollow  Squares),  by  the  Horaal  Approximation  Method,  Using 
the  Original  Old  Teat. 

retransform  t  to  6  after  the  process  has  been  completed  (cf. 
Sections  III. 8  and  V.3)  . 

(V.6)  Approximation  to  the  Density  Function  of  the  Maximum 
Likelihood  Estimate  by  £  Polynomial  Obtained  by  the 
Method  of  Moments 

It  Is  noted  that,  in  the  Normal  Approximation  Method,  the 
marginal  density  function,  g(0)  ,  is  totally  unused.  In  contrast 
to  this  fact,  in  the  present  study,  we  make  the  full  use  of  this 
marginal  density  function.  In  so  doing,  we  approximate  g{3)  or 
g(t)  ,  depending  upon  the  necessity  of  the  transformation  0  to 
t  ,  by  a  polynomial  obtained  by  the  method  of  moments.  An  example 
of  this  approximation  was  already  given  in  Section  IV. 1,  as  Figure 


-57- 


V-12 


4-2-1  .  In  this  example,  three  different  polynomials  of  degrees  3, 
4,  and  5  were  fitted  to  the  total  set  of  five  hundred  maximum 
likelihood  estimates  8  ,  which  are  based  upon  the  original  Old 

Test.  These  three  different  situations  are  called  Degree  3  Case, 
Degree  4  Case  and  Degree  5  Case,  respectively. 

Figure  5-6-1  presents  another  example  of  approximating  the 
density  function  by  a  polynomial  obtained  by  the  method  of  moments. 
In  this  example,  however,  the  target  density  function  is  divided 
into  two  portions,  which  belong  to  those  who  answered  item  h 
correctly  and  those  who  did  not,  respectively,  and  three  polynomials 
of  degrees  3,  4  and  5  were  fitted  to  each  portion.  The  result 
illustrated  here  is  for  item  6,  and  the  original  Old  Test  was  used 
for  producing  the  five  hundred  maximum  likelihood  estimates. 


MAXIMUM  LIKELIHOOD  ESTIMATE  § 

FIGURE  5-6-1 

Approximations  to  tba  Two  Tortious  of  tha  Dans It?  Function,  g(§)  ,  for  tha 
Success  and  Failure  Croups  of  Itasi  6,  Respectively ,  by  Polynomials  of  Decree 
3  (Dots) ,  of  Degree  4  (Short  Dashas)  and  of  Dagraa  5  (Long  Dashas)  Obtainad 
by  tha  Kathod  of  Movants.  Maximum  Likalihood  Estleatss  Ara  Basad upon  tha 
Original  Old  Teat,  and  Ara  Shorn  As  Two  Histograms. 

To  distinguish  the  two  subset  of  the  maximum  likelihood  estimates 
from  each  other,  the  histogram  of  8  for  the  failure  group  is 
marked  with  crosses,  and  the  one  for  the  success  group  is  marked 
with  solid  triangles.  The  two  polynomials  of  degree  3  are  drawn  by 
dotted  lines,  those  of  degree  4  are  plotted  by  short  dashed  lines, 
and  those  of  degree  5  are  drawn  by  long  dashed  lines.  This  is  an 


I 


-58- 


V-13 


example  chosen  from  those  which  are  used  In  the  Bivariate  P.D.F. 
Approach,  which  will  be  introduced  In  Section  V.10  .  From  these 
approximated  density  functions,  we  can  obtain  the  estimated 
conditional  moments  of  ability  6  ,  given  its  maximum  likelihood 
estimate  8  ,  through  the  formulas  (3.22)  through  (3.25). 

Table  5-6-1  presents  the  number  of  hypothetical  examinees 
for  each  of  the  two  subgroups,  i.e.,  those  who  answered  correctly 
to  each  of  the  ten  unknown,  binary  test  items  and  those  who  did 
not,  respectively.  There  are  seven  hypothetical  examinees  to  be 


TABLE  5-6-1 


Miabar*  of  Hypothetical  &uain«aa  Who  Belong  Co  tho  Succaaa  and  failure 
Croup*  of  Each  of  tha  Tan  Unknown,  Binary  Taat  Itao**,  Xagatlv*  thsabar 
Shown  in  Bracket*  After  Bach  Entry  Indicates  the  Nuaber  of  Examinees  to 
Bs  Subtracted  Whan  Va  Uaa  Degree  4  Case  for  the  Total  Sat  of  HbxIsmb 
Likelihood  lattaataa  Which  Are  Based  Upon  the  Original  Old  Taat. 


Itemh 

Failure 

Subgroup 

Success 

Subgroup 

1 

22  (-3) 

478  (-4) 

2 

68  (-1) 

432  (-6) 

3 

100  (-3) 

400  (-4) 

4 

150  (-3) 

350  (-4) 

5 

202  (-3) 

298  (-4) 

6 

246  (-3) 

254  (-4) 

7 

302  (-3) 

198  (-4) 

8 

345  (-3) 

155  (-4) 

9 

399  (-3) 

101  (-4) 

10 

429  (-4) 

71  (-3) 

excluded  in  Degree  4  Case,  when  we  use  the  maximum  likelihood 
"estimates  based  upon  the  original  Old  Test  and  either  Two-Parameter 
Beta  Method  or  Normal  Approach  Method,  which  will  be  introduced  in 
Sections  V.7  and  V.8  .  For  one  of  them,  the  estimated  density 
function,  g(9)  ,  assumes  a  negative  value,  and,  for  the  other  six, 
the  estimated  conditional  variance,  Var.(6|§  )  ,  turned  out  to  be 
negative.  The  frequencies  to  be  subtracted  from  those  for  the 
success  and  failure  groups  for  each  of  the  ten  unknown,  binary  test 
items  are  shown  in  brackets  in  Table  5-6-1.  Exclusions  of 


-59- 


V-14 


examinees  happened  in  some  other  situations  where  we  used  different 
methods  and/or  different  Old  Tests,  but  the  number  of  examinees 
excluded  does  not  exceed  nineteen. 

In  most  of  our  studies  both  Degree  3  and  4  Cases  were  used, 
and  sometimes  Degree  5  Case  was  added.  As  it  turned  out,  In  all 
situations,  the  resultant  estimated  item  characteristic  functions 
of  the  ten  unknown,  binary  test  items  are  practically  Identical 
across  the  cases  for  the  meaningful  range  of  ability  6  .  This 
proves  the  robustness  of  our  methods  and  approaches  over  the 
approximation  to  the  density  function,  g(§)  . 

(V . 7 )  Pearson  System  Method 

We  shall  assume  that  the  square  root  of  the  test  information 
1/2 

function,  [1(6)1  ,  of  our  Old  Test  is  not  constant,  as  is  the 

case  with  most  practical  situations.  Thus  we  need  the  transformation 
of  8  to  t  ,  and,  at  the  end  of  the  vhole  process,  the 
retransformation  of  x  to  8  ,  the  rationale  and  actual  procedure 
of  which  were  described  in  Sections  III. 8  and  V.3  .  If  the  Old  Test 
has  a  constant  amount  of  test  information,  as  is  the  case  with  our 
original  Old  Test,  the  reader  may  simply  replace  t  by  G  .  Let 
4>(t  |  t)  denote  the  conditional  density  function  of  t  ,  given  its 
maximum  likelihood  estimate,  x  .  It  should  be  recalled  that  t 
is  obtained  from  8  through  the  same  polynomial  transformation 
which  was  introduced  in  Section  V.3  .  We  can  write  for  the  first 
through  fourth  conditional  moments  of  t  ,  given  x  , 

(5.20)  E(t|t)  -  t  +  c”2  j*-  log  g(x)  . 

(5.21)  Var.(x^) -C_2[l+  c’2^2  log  g(x)]  . 

(5.22)  E[{t-E(t|t)}3|t1  -  C'^l  ^3  log  g(x)]  . 

and 


j'v,ivw 


-60- 


V-15 


(5.23)  E[{t-E(x|t)}4|x]  -  C_4[3  +  6C-2{  ^  log  g(x) 

+  3C  ^  It2  log  g(r))2  +  C'4{  -^-4  log  g(T)  >] 

where  C  is  the  target  constant  for  the  square  root  of  the  test 

1/2 

information  function,  [I*(t)]  .  Substituting  (5.21),  (5.22) 

and  (5.23)  for  »  ^3  anci  ^4  in  (4.2)  and  (4.3),  we  obtain 

the  two  indices,  0^  and  ^  »  and,  from  these  two  values  and  (4.1), 
Pearson's  criterion  k  is  obtained.  These  indices,  which  can  be 
computed  for  any  fixed  value  of  x  ,  will  indicate  which  type  of 
distribution  of  Pearson's  system  (Elderton  and  Johnson,  1969; 

Johnson  and  Kotz,  1970)  we  should  turn  to  for  4>(t|t)  .  A  brief 
summary  of  this  procedure  can  be  described  as  follows. 


Type  I  (Beta  distribution,  general) 

:  k<0 

Type  II  (Beta  distribution,  symmetric) 

:  k-0,  01»O,  02<3 

Type  III  (gamma  distribution) 

:  k-»,  2  02-30J-6-O 

Type  IV 

:  0<k<1 

Type  V 

:  k-1 

Type  VI 

:  k>1 

Type  VII  (including  t-distribution) 

:  k-0,  0^0,  03  >3 

Normal  distribution 

:  k-0,  0  -0,  0_  -3 

The  estimated  conditional  density  function,  |(x|x)  ,  thus 
approximated,  has  an  important  role  in  all  of  our  four  different 
approaches,  which  will  be  introduced  in  Sections  V.10  through  V.14  . 

It  is  a  characteristic  of  the  Pearson  System  Method  that  we 

use  all  of  the  first  four  conditional  moments  of  x  ,  given  x  . 

Using  these  four  conditional  moments,  the  indices,  0^  ,  0^  and  k  , 

are  obtained,  and  they  direct  us  to  one  of  the  Pearson  System 

distributions.  For  example,  when  we  approximate  the  density  function 

g(6)  ,  which  is  based  upon  the  original  Old  Test,  for  the  total 

group  of  examinees,  in  Degree  3  Case,  4>(0|6)  turned  out  to  be  of 

Type  I  for  318  values  of  §s  ,  of  the  normal  distribution  for  181 

values  of  §  ,  and  for  the  other  one  case  it  is  undefined  because 

s 


-61- 


V-16 


of  the  negative  value  for  the  estimated  fourth  conditional  moment; 
in  Degree  A  Case,  4>(0|§)  proved  to  be  of  Type  1  for  A32  values  of 

A  ^ 

6  ,  of  Type  II  for  54  values  of  0  ,  undefined  for  13  values  of 

„s  3 

0  because  of  the  negative  values  for  the  estimated  second  and/or 
s 

fourth  conditional  moments,  and  for  the  other  one  case  the  estimated 

A  .  A 

density.,  g(0  )  ,  is  negative  and,  therefore,  it  is  undefined.  If, 

3  *  I  A 

for  Instance,  4> ( Q J 0)  .  is  of  Type  I,  then  the  four  parameters  of  the 
Beta  distribution  will  be  estimated  from  the  four  conditional 

A 

moments  of  6  ,  given  0^  ,  and  so  on. 

In  comparison  to  the  other  two  methods,  i.e.,  Two-Parameter 
Beta  Method  and  Normal  Approach  Method,  which  will  be  introduced  in 
Section  V.8  and  V.9  ,  we  can  say  Pearson  System  Method  is 
theoretically  sound.  It  will  provide  us  with  varieties  of 
unrestricted  curves  for  the  estimated  conditional  density  functions, 

A  I  A 

<Kt|t)  ,  which  will  enable  us  to  approximate  the  true  conditional 
density  functions  well.  Its  disadvantage  lies  in  the  fact  that  the 
use  of  higher  conditional  moments,  like  the  fourth  moment,  may  lead 
us  to  inaccuracy  of  estimation,  as  is  implied  in  the  two  examples 
given  in  the  preceding  paragraph.  If  this  is  the  case,  we  may  use 
cither  Two-Parameter  Beta  Method  or  Normal  Approach  Method,  which 
requires  only  the  first  two  conditional  moments. 

(V.8)  Two-Parameter  Beta  Method 

Beta  distribution  is  known  for  its  abundance  of  different 
shapes  in  its  density  function.  They  include  unimodal,  symmetric 
curves,  unimodal,  asymmetric  curves,  J-shape  curves,  U-shape 
curves,  and  linear  functions.  For  this  reason,  the  distribution 
has  been  used  by  many  researchers  in  approximating  empirical 
distributions.  In  the  Pearson  System  Method',  which  was  introduced 
in  the  preceding  section,  Beta  distribution  is  used  as  two  of  the 
Pearson  System  distributions,  i.e..  Types  I  and  II.  When  we 
approximate  the  conditional  density,  $(t|t)  ,  by  a  Beta  density 
function,  we  can  write 

-  [B(p-,q-)]"1(T-a-)PT"1(b--T)qT-1(b--&-)“^pT+qT~1^  , 


(5.24) 


-62- 


V-17 


where  p*  ,  q*  ,  a*  and  b*  are  the  four  parameters  of  the  Beta 
distribution,  and  B(p*,q*)  is  the  Beta  function  which  is  given  by 

/ 1  p--l  q*-l 

(5.25)  B(pA,q-)  «  1  u  T  (1-u)  T  du  . 

T  T  JO 

These  four  parameters  are  estimated  from  the  first  four  conditional 
moments  of  t  ,  given  t  ,  and  the  resultant  63  and  6^ 

Section  IV. 1).  We  can  write 

(5.26)  pjj  ,  -  (r/2)  [1  ±  (r+2)(01[81(r+2)2  +  16(r+l) ]_1)1/2]  , 

(5.27)  b«  -  a*  -  (E[(t  -  E[ r | t]> 2 j r] }1/2{01(r+2)2  +  16(r+l)}1/2  /2 

(5.28)  g.  -  e[t|t]  -  p^(bra^)/r  , 
and 

(5.29)  b-  -  E[t|t]  +  q*(b--a,j)/r  , 

where 

(5.30)  r  -  6(82-83-1)  /  (6+383-262)  . 

When  the  two  parameters,  p*  and  q*  ,  are  equal,  the  Beta 
distribution  becomes  Pearson’s  Type  II  distribution,  and  we  have 

(5.31)  *  q*  -  r/2  . 

and,  otherwise,  it  is  Pearson's  Type  I  distribution. 

When  the  two  of  the  four  parameters  of  the  Beta  distribution, 
a*  snd  b~  ,  which  are  the  lower  and  the  upper  endpoints  of  the 
interval  for  which  the  density  function  assumes  positive  values, 
are  a  priori  given,  the  estimation  of  the  other  two  parameters  is 
much  more  simplified.  In  fact,  we  only  need  the  first  two 
conditional  moments  of  t  ,  given  t  ,  in  addition  to  the  set 


-63- 


V 


values  £or  a*  and  b-  .  We  have 
T  T 

(5.32)  p  -  (1-M^  M’1  -  Mx  , 

and 

(5.33) 
where  M^ 

(5.34) 

and 

(5.35)  M2  «  Var. <t|t)  (b.-a.)“2  . 

In  the  Two-Parameter  Beta  Method,  we  adopt  a  priori  set 
parameters,  a  a  and  h*  ,  and  estimate  the  other  two  parameters, 
p*  and  q.  ,  accordingly,  and  use  them  In  (5.24)  for  the 

T  T 

estimated  conditional  density,  $(t|t)  .  It  has  an  advantage  over 
the  Pearson  System  Method  in  the  sense  that  we  only  need  the  first 
two  conditional  moments  of  t  ,  given  t  ,  instead  of  four,  and 
yet  we  can  make  use  of  the  abundance  of  different  shapes  of  the 
Beta  density  function.  The  biggest  problem  is  how  to  select 
suitable  valueo  for  a.  and  bA  for  each  fined  value  of  t  .  In 

T  T 

the  present  study,  these  values  are  chosen  relatively  arbitrarily, 
and  we  adopted 


q  -  Mx  (l-M^"1  -  (1-MX)  , 

and  are  defined  by 

-  [E(T|f)-aA](bA-aA)_1  , 


(5.36) 


’&»■!—  2.55C  ^ 

T 

bA  -  t  +  2.55C-1 

'  T 


where  C  is  the  target  constant  square  root  of  the  test 
information  function,  [I*(t)]  ^  .  Acfually,  this  method  was  used 
only  for  the  original  Old  Test,  so  C  1  equals  o  (■  0.215)  . 


-64- 


Vt19 


Although  all  the  results  obtained  in  the  present  study  turned 
out  to  be  as  good  as  they  can  be,  which  will  be  introduced  in 
later  sections,  the  selection  of  suitable  values  for  a-'  and  b~ 

T  T 

is  yet  to  be  Investigated  in  future,  to  make  Two-Parameter  Beta 
Method  theoretically  sounder  and  more  useful. 

(V.9)  Normal  Approach  Method 

A  simple,  straightforward  method  of  approximating  the 
conditional  density  function,  <Kt|t)  ,  using  the  only  first  two 

A 

conditional  moments  of  t  ,  given  t  ,  may  be  the  approximation  by 
a  normal  density  function.  We  can  write 

(5.37)  $<t|t)  -  [27ry2r1/2  exp[-(T-y*?  (2u2)_1)  , 

where 

(5.38)  yj.  -  E[x | t]  , 
and 

(5.39)  v2  “  Var.[t|T]  . 

An  advantage  of  this  method  over  the  Pearson  System  Method 
is  that  we  need  only  the  first  two  conditional  moments,  and  one 
over  the  Two-Parameter  Beta  Method  is  that  we  do  not  need  any  a 
priori  set  parameters.  A  disadvantage  is  obviously  that  it 
restricts  the  estimated  conditional  moment  to  be  a  imimodal, 
symmetric  function,  regardless  of  its  true  shape.  In  spite  of  this 
restriction,  however.  Normal  Approach  Method  worked  very  well  both  in 
combination  with  the  Bivariate  P.D.F.  Approach  and  with  the 
Conditional  P.D.F.  Approach,  the  results  of  which  we  shall  see  in 
succeeding  sections. 

(V.10)  Bivariate  P.D.F.  Approach 

As  was  introduced  in  Section  V.5,  in  the  Normal  Approximation 
Method,  we  approximate  the  bivariate  distribution  of  t  and  t  , 


MAXIMUM  LIKELIHOOD  ESTIMATE 


-65- 


V-20 


«D 


or  6  and  0  if  the  square  root  of  the  test  information  function 
of  our  Old  Test  is  constant,  by  a  bivariate  normal  distribution, 
for  each  subpopulation  of  examinees  who  share  the  same  item  score 
to  item  h  .  Our  results  turned  out  to  be  quite  successful. 

Figure  5-10-1  presents  the  theoretical  regression  of  0  on 
6  which  is  based  upon  the  original  Old  Test,  and  the  intervals  of 
the  standard  error  above  and  below  this  re  'ession,  by  dotted  lines, 


LATENT  TRAIT  • 


LATENT  TRAIT  • 


FIGURE  5-10-1 


Comparison  of  the  Theoretical  Regression  for  Ability  6  on  Its  Maximum  Likelihood 
Estimate  6  (Dotted  tine)  with  the  Best  Fitted  Line  of  Ability  6  ,  on  6  (Dashod 
Line),  for  Each  Item  Score  Group  of  Item  1  .  Also  the  Standsrd  Errors  of  Estimation 
Are  Shown  on  Each  Side  of  the  Regression,  and  of  the  Best  Fitted  Line. 


for  each  of  the  success  and  the  failure  groups  for  item  1  .  In  the 
same  figure  also  presented  by  dashed  lines  are  the  empirical  linear 
regression  of  6  on  0  ,  with  the  intervals  of  the  empirical 
standard  error  above  and  below  the  linear  regression,  which  were 
introduced  in  Section  V.5  .  We  can  see  in  this  figure  that  for  the 
success  group  these  two  sets  of  curves  are  almost  identical  for 
most  of  the  meaningful  range  of  6  ,  while  the  discrepancies  are 
substantial  for  the  failure  group.  This  example  of  the  failure 


-66- 


V-21 


group  for  item  1  is  the  only  extreme  case,  and,  in  fact,  thirteen 
out  of  the  remaining  eighteen  cases  provided  us  with  similar ' results  as 
the  one  for  the  success  group  for  item  1  ,  four  cases  show  slight 
discrepancies,  and  the  other  one  case  lies  somewhere  between  the 
two  examples  in  Figure  5-10-1  in  diversion. 

The  results  illustrated  in  Figure  5-10-1  suggest  that  we  may 
need  to  investigate  some  other  approach  than  the  approximation  by 
the  bivariate  normal  distribution  to  the  joint  distribution  of  t 

A 

and  x  .  This  can  be  done  by  making  use  of  the  marginal  density 
functions  of  t  and  the  conditional  density  functions  of  t  „ 

A 

given  t  ,  for  the  separate  subpopulation  of  examinees. 

A 

Let  g  (t)  denote  the  proportion  of  the  density  function 
*h 

of  the  maximum  likelihood  estimate  t  for  the  subpopulation  of  the 

examinees  who  share  the  same  item  score,  x^  (“0 

<j>  (t|t)  and  £  (t,t)  be  the  corresponding  conditional  density 

*h  „  *h 

of  t  ,  given  t  ,  and  the  proportion  of  the  bivariate  density  of 

A 

t  and  t  ,  respectively.  We  can  write 

(5. AO)  5  (t,t)  -  *  (t|t)  g  (t)  , 

*h  *h  *h 

where 

“h 

(5.41)  g(t)  »  S  g  (t) 

v°  ^ 

and 

(5.42)  £(t,t)  "25  (t,t)  . 

v°  ^ 

To  obtain  the  estimate  of  the  proportion  of  the  bivariate 
density,  5  (t,t)  ,  we  classify  the  set  of  N  t^’s  into  (rn^+1) 


,l,...,mh)  , 


l 


-67- 


V-22 


item  score  categories,  depending  upon  the  item  score  x^  (*0,1, . . . ,m^) 

the  examinee  i  obtained  for  a  new  test  item  h  ,  for  which  the 

operating  characteristics  are  to  be  estimated.  The  method  of 

moments  is  applied  for  each  of  these  (m^+1)  subsets  of  x  ,  and  the 

shared  density  function,  g  .  (x)  ',  is  estimated  for  each  subgroup. 

*h 

a 

The  conditional  moments  of  t  ,  given  t  ,  are  also  obtained  for 
separate  subgroups,  using  the  formulas  (5.20)  through  (5.23)  ,  with 

A  A 

the  replacement  of  g(x)  by  (N/N  )  g  (x)  ,  where  N  denotes 

*h  ^  *h 

the  number  of  examinees  whose  item  scores  to  item  h  are  x,  . 


Based  on  these  estimated  conditional  moments,  the  parameters  of  a 

specific  density  function,  which  is  adopted  for  4>  (x|x)  ,  are 

*h 

obtained  for  each  subgroup  x,  .  The  choice  of  <J>  (x|x)  depends 

n  *11 

upon  which  of  the  three  methods,  i.e..  Normal  Approach  Method, 

Two-Parameter  Beta  Method  and  Pearson-System  Method,  is  taken.  The 

bivariate  density  function  of  x  and  x  is  obtained  from  (5. AO) 

for  each  of  the  (m^+1)  subgroups.  Then  the  estimated  operating 

characteristic,  p  (0)  [»  p*  (x(0))]  ,  is  given  by 
*h  n 


(5. A3) 


Px  (0) 
*h 


■P 


£  (x,x)dx  [ 
1  n 


“h  /« 

s  L<x, 

j-0  /-«  3 


x  )  dx  ] " 


\  m  0,1*  * ♦ •  . 

This  approach  was  applied  to  our  data  (cf.  Section  III. 3)  in 
combination  with  the  Normal  Approach  Method  (cf.  Section  V.9)  for 
Degree  3,  A  and  5  Cases.  We  used  the  five  hundred  maximum  likelihood 


estimates, 


0g  ,  which  were  based  upon  the  original  Old  Test.  The 


polynomials  of  degrees  3,  A  and  5  approximating  g  (x)  for  each  of 

*h 

the  two  subpopulations,  i.e.,  the  success  and  the  failure  groups, 
are  illustrated  for  h  -  6  in  Section  V.6  as  Figure  5-6-1. 

Figure  5-10-2  presents  the  resultant  estimated  ability 
distributions  in  Degree  3  (dotted  curve),  A  (short,  dashed  curve) 
and  5  (long,  dashed  curve),  together  with  the  theoretical  density 
(solid  curve)  and  the  frequency  distribution  of  6  (histogram  with 


-68- 


V-23 


.  FIGURE  5-10-2 

Eatlautad  Proportion*  of  tha  D«n*ity  runction  of  Ability  6  in  Dagra*  3  (Dottad 
Curve) ,  4  (Short,  Daahad  Curva)  and  5  (Long,  Daahad  Curva)  Caaaa  of  the  Bivariate 
V.D.F.  Approach  with  the  Normal  Approach  Method,  for  Each  of  th*  Succaa*  and 
failure  Subpopulationa.  Actual  Fraquaneit*  (Solid  Lin*  with  Diamond*)  and  th# 

Thcoratlcal  Proportion  of  th*  Danaity  Function  (Solid  Curve)  Are  Alao  Drawn, 

solid  diamonds),  for  each  of  the  success  and  failure  groups.  We 
can  see  in  this  figure  that,  except  for  the  lower  end  of  6  for  . 
the  failure  group  and  the  upper  end  of  0  for  the  success  group, 
these  three  curves  of  Degree  3,  4  and  5  Cases  are  very  close  to  the 
theoretical  curves.  The  results  for  the  other  nine  binary  test  items 
are  similar  to  this  example.  In  some  cases  the  fit  is  best  in  Degree 
5  Case  and  worst  in  Degree  3  Case,  but  this  order  is  not  true  with 
all  the  cases.  In  most  cases,  the  resultant' three  curves  are  close 
to  one  another,  as  we  can  see  in  Figure  5-10-2. 

Figure  5-10-3  presents  the  resultant  three  estimated  item 
characteristic  functions  of  Degree  3,  4,  and  5  Cases  for  item  6, 
which  were  obtained  from  (5.43)  with  1  and  m^*  2  ,  by  dotted, 

short  dashed  and  long  dashed  curves,  respectively.  We  can  see  in 


-69- 


V-24 


this  figure  that  all  these  results  are  close  to  the  theoretical  item 
characteristic  function,  which  is  also  shown  in  the  figure  by  a  solid 
curve,  and  are  much  closer  than  the  frequency  ratios  of  8  for  the 
correct  answer,  which  are  shown  by  a  solid  line  with  diamonds  in  the 
figure.  If  we  compare  these  three  estimated  item  characteristic 
functions  with  one  another,  we  can  say  that  the  result  of  Degree  3 
Case  is  not  as  good  as  the  other  two.  This  is  not  a  general 
tendency,  however.  For  most  of  the  other  nine  test  items,  the 
resultant  estimated  item  characteristic  functions  of  Degree  3  Case 
are  much  closer  to  the  corresponding  theoretical  item  characteristic 
functions,  and,  in  fact,  for  item  7  it  showB  the  best  fit  among  the 
three. 


Eatlaatad  Itaa  Charactarlatlc  Function*  of  Xtau  6  for  D«gr*«  3  (Dottad  Curva),  A 
(Short,  D«»h«d  Curva)  and  3  (tong,  Daahad  Curva)  Caaaa  of  tha  Bivariate  F.D.F. 
Approach  with  tha  Normal  Approach  Mathod,  Vogathar  with  th*  Thaoratlcal  Item 
Characteristic  Function  (Solid  Curva)  and  tha  Actual  Fraquancy  Katloa  (Solid 

Lina  with  Diamond*). 

(V.ll)  Histogram  Ratio  Approach 

In  this  approach,  and  also  in  the  Curve  Fitting  Approach  and 
the  Conditional  F.D.F.  Approach,  which  will  be  introduced  in  the 
following  two  sections,  we  make  use  of  the  flstimated  conditional 
density  function  of  t  ,  which  is  evaluated  for  the  maximum 


-70- 


V-25 


likelihood  estimate,  t  ,  of  each  individual  examinee  s  .  This 
is  the  difference  of  these  three  approaches  from  the  Bivariate 
P.D.F.  Approach,  in  which  $(x|x)  is  used  for  approximating  the 
bivariate  density  function,  5(x,x)  ,  as  we  have  observed  in  the 
preceding  section. 

Using  the  Monte  Carlo  method,  we  have  the  computer  produce 
a  specified  number  of  x  following  the  estimated  conditional 
density  function,  $(*1^)  ,  for  each  value  of  xfi  .  Let  x  denote 
the  values  of  x  thus  produced,  as  we  did  in  the  Normal 
Approximation  Method,  and  v  be  the  number  of  x  *s  produced  for 
each  x  .  The  resultant  set  of  x  's  are  classified  into  (m^+1) 
categories,  depending  upon  the  item  score  (“0,1, . . . .m^)  which 

the  examinee  s  obtained  for  item  h  •  Then  each  x  is 
transformed  to  0  ,  by  means  of 

(5.44)  0  -  x  ^ [ T <6) ]  . 

When  x (  )  is  given  by  the  polynomial  shown  as  (5.14),  this  process 
can  easily  be  performed  by  the  Newton-Raphson  Method. 

We  divide  the  interval  of  0  of  our  interest  into 
subintervals  of  equal  width.  Let  t  denote  the  subinterval,  0^. 
be  the  midpoint  of  the  subinterval  t  ,  and  Hx^(0et)  denote  the 
frequency  of  6  ' s  ,  which  belong  to  the  item  score  x^  and  the 
subinterval  t  .  We  have  for  the  estimated  operating  characteristic 
of  the  item  score  x^ 

mg 

(5.45)  #  (8  )  “  H  (Sct)[  E  H, (0et) ]  1  ,  x“0,l, . . . . 

xh  *  j-0  J 

This  approach  was  applied  to  the  set  of  five  hundred  maximum 
likelihood  estimates  0g  ,  which  were  obtained  upon  the  original 
Old  Test,  in  combination  with  the  Two-Parameter  Beta  Method  for 
approximating  the  conditional  density  function,  *(e|0)  .  The 
number  of  hypothetical  examinees  actually  used  in  Degree  4  Case  is 


-71- 


V-26 


493  (cf.  Section  V.6),  while  in  Degree  3  Case  the  total  500 
examinees  were  used.  In  both  cases,  we  adopted  v  -  5  ,  and  0.25 
for  the  subinterval  width.  Figure  5-11-1  presents  the  resultant 
estimated  item  characteristic  functions  of  item  6  for  Degree  3  Case 
by  triangles,  and  for  Degree  4  Case  by  squares,  respectively. 


LATENT  TRAIT  9 

FIGURE  5-11-1 

KatlMttd  It«M  Ch»r*ct*ri*tic  function*  of  Itm  (i  for  Ddgra*  3  (Xrlongl**) 
and  4  (Squat**)  Caaa*  of  th#  Hlatograa  Hatio  Approach  and  Tho*a  (or  Dagtaa 
3-3  (Long,  Daahad  Curva)  and  3-4  (Short,  Saahad  Curva)  Caaaa  of  tha  Curva 
fitting  Approach  ,  with  tha  Two-Par«Mat#r  Bata  Mathod. 

We  can  see  that  the  two  sets  of  estimates  are  fairly  close  to  the 
theoretical  item  characteristic  function  of  item  6,  which  is  drawn 
by  a  solid  curve  in  Figure  5-11-1,  It  is  expected  that  the  fitness 
will  be  even  better  if  we  increase  v  ,  and  decrease  the  subinterval 
width.  Similar  results  were  obtained  for  each  of  the  other  nine 
binary  test  items. 

An  advantage  of  the  Histogram  Ratio  Approach  over  the  others 
lies  in  its  simplicity  and  straightforwardness.  In  order  to  obtain 
a  smooth  curve  for  the  estimated  operating  characteristic,  it  is 
advisable  to  use  a  fairly  large  number  for  v  ,  and  a  small  width 
for  the  subinterval,  t  ,  of  6  . 


-72- 


V-27 


(V.12)  Curve  Fitting  Approach 

This  approach  follows  the  same  process  as  the  Histogram  Ratio 
Approach  until  we  obtain  v  N  0's,  which  are  divided  into  (m^+1) 
subsets  of  item  scores  x^  for  item  h  .  Then  for  each  subset  of 

0's  a  polynomial  of  a  specified  degree  is  fitted  by  the  method 

of  moments.  Let  n  (0)  denote  such  a  polynomial  fitted  for  the 

xh 

A 

0  's  of  the  subset  .  The  estimated  operating  characteristic 
of  the  item  score  is  given  by 

(5.46)  vv  (0)  -  n  <0)[  n.(e)  ]  1  ,  X. -0,1.. ..,1V  • 

*h  *h  j-0  3  n 

This  approach  was  applied  to  the  same  set  of  0  *s  as  we 
obtained  for  the  Histogram  Ratio  Approach  in  the  preceding  section. 
Both  polynomials  of  degree  3  and  degree  4  were  fitted  to  the 
resultant  two  subsets  of  6  's  ,  which  were  obtained  in  each  of 
Degree  3  and  Degree  4  Cases.  We  shall  call  these  four  cases 
Degree  3-3,  3-4,  4-3  and  4-4  Cases,  with  the  second  number  indicating 
the  degree  of  the  polynomials  fitted  to  the  subsets  of  §  's  .  An 
example  of  the  curve  fitting  for  Degree  3-3  and  3-4  Cases  for  item 
4  was  given  in  Section  IV. 1  as  Figure  4-1-3. 

The  resultant  estimated  item  characteristic  function  for 
item  6  in  Degree  3-3  and  3-4  Cases  are  shown  in  Figure  5-11-1  in 
the  preceding  section  by  long  and  short  dashes,  respectively, 
together  with  the  results  obtained  by  the  Histogram  Ratio  Approach. 
Figure  5-12-1  presents  the  corresponding  results  for  Degree  4-3  and 
4-4  Cases  by  long  and  short  dashes,  respectively.  We  can  see  that 
all  of  these  four  results  are  very  close  to  the  theoretical  item 
characteristic  function,  except  for  both. ends  of  the  curves. 

In  this  example  of  item  6,  we  can  say  the  curve  for  Degree  3-4  fits 
the  best  to  the  theoretical  item  characteristic  function.  We 
cannot  generalize  this  to  the  other  items,  however,  and  there  is  no 
systematic  tendencies  as  to  which  of  the  four  cases  provides  us 
with  best  fitting  curves. 


-73- 


V-28 


FIGURE  5-12-1 


Estimated  Itaa  Characteristic  Function*  of  Itsa  6  for  Dagraa  3  (Triangles) 
and  4  (Squaros)  Casas  of  tha  Hlatograw  Ratio  Approach  and  Those  for  Qagras 
4-3  (Long,  Sashed  Curve)  and  4-4  (Short,  Sashed  Curve)  Casas  of  the  Curve 
Fitting  Approach  ,  with  the  Tuo-Faraaeter  Bata  Method. 


(V.13)  Conditional  P.D.F.  Approach 

In  this  approach,  we  use  the  whole  approximation  to  the 

conditional  density  function,  4,(TIT  )  •  In  the  Simple  Sum 

Procedure,  we  have  for  the  operating  characteristic  of  the  item 

score  x, 
n 

N  , 

(5.47)  P  (9)  -  P*  [t(8)]  -  2  $(t|t  )[  2  ^(t|t  ))"  , 

*h  *h  sex^  8  s-l  8 

v0,i,***,oh  • 

In  the  present  study,  this  approach  was  frequently  used.  Among 
others,  it  was  used  for  the  comparison  of  the  results  obtained 
upon  several  different  Old  Tests,  which  will  be  introduced  in 
Chapter  6. 

It  should  be  noted  that  we  can  write  for  the  conditional 

A 

density  of  T  ,  given  t  , 

s 


-74- 


V-29 


(5.48)  4><t | x  )  -  iK^Jt)  f*(x)  t 

s  s 


/: 


4>(t  It)  f*(t)  dr) 


-1 


where  4>(t  |t)  is  the  conditional  density  of  r  ,  given  t  ,  and 
f*(t)  is  the  marginal  density  of  t  .  From  our  simulated  data, 
we  can  obtain  this  theoretical  density  function,  by  using  n(t,C  ^) 

for  iJ/(t  |t)  ,  where  C  is  the  target  square  root  of  the  test 

s  1/2 
information  function,  [I*(t)3  ,  and 


(5.49)  f*(T)  -  f  (e)  ~  -  f  (e)  c  [  i  QL  eV1 

aT  k-0  K 

7 

-  0.2Cf  E  CL  ek]_1  for  t(-2.5)  <  t  <  t(2.5) 
k-0  K 


{ 


-  0 


otherwise. 


I  A  |  A 

We  can  replace  <Kt|t  )  in  (5.47)  by  (t ] t  )  thus  obtained,  and 
the  resultant  function  is  called  the  criterion  operating 
characteristic  of  item  score  .  This  function  is  the  limiting 
case  that  we  can  possibly  attain  by  adopting  the  Simple  Sum 
Procedure  of  the  Conditional  P.D.F.  Approach  upon  a  given  set  of  data. 


Figures  5-13-1  through  5-13-3  present  the  three  sets  of 
estimated  item  characteristic  functions  of  item  6,  obtained  by  the 
Conditional  P.D.F.  Approach,  in  comparison  with  the  theoretical 
item  characteristic  function,  which  is  drawn  by  a  thick,  solid  curve, 
and  the  frequency  radios  of  the  correct  answer,  which  are  shown  by 
the  combination  of  long  dashes  and  dots.  These  resultant  estimated 
operating  characteristics  are  based  upon  the  Conditional  P.D.F. 
Approach  combined  with  the  Two-Parameter  Beta  Method,  Normal 
Approach  Method  and  Pearson  System  Method,  respectively.  In  each 
figure,  the  result  obtained  in  Degree  3  Case  is  plotted  by  long 
dashes,  and  the  one  obtained  in  Degree  4  Case  is  drawn  by  short, 
thick  dashes,  respectively.  There  is  the  fifth  curve,  plotted  by 
a  thin,  solid  curve  in  each  figure,  i.e.,  the  criterion  item 
characteristic  function  of  item  6.  It  is  hard  to  single  it  out. 


JC,  w 

o 

P 

O 

Z  08 
«u 

O 

P  06 

K 

HI 

£  04 

O  02 

2 

“  06 


ITEM  6 


4.0  -2j0 


FIGURE  5-13-1 


Estimated  it n  Characteristic  Functions  Obtained  by  the  Conditional  F.D.F. 
Approach  with  the  Three-Paraaeter  Beta  Method,  in  Degree  3  (Long  Sashes) 
and  4  (Short,  Thick  Dashes)  Cases,  in  Comparison  with  the  Criterion  Item 
Characteristic  Function  (Thin,  Solid  Curve),  the  Frequency  Ratios  of  the 
Correct  Answer  (Lrng  Dashea  and  Dots),  and  the  Theoretical  Item 
Characteristic  Function  (Thick,  Solid  Curve) . 


ITEM  6 


Result 


FIGURE  5-13-2  20  30 

of  the  Normal  Approach  Method,  in  Comparison  with  the  Other  Three. 


Z  w 
O 

1  OB 

z> 

u. 

o 

06 

tn 

S 

HI 

O  04 
< 

DC 

5  OP 

o  (X?, 

2 

-  00 


ITEM  6 


•30  -SO 


FIGURE  5-13-3 


Result  of  the  Pearson  System  Method,  in  Comparison  with  the  Other  Three. 


however,  because  in  each  figure  the  three  curves,  i.e.,  the  results 
of  Degree  3  and  4  Cases  and  the  criterion  item  characteristic 
function,  are  practically  indistinguishable.  This  result  is  not 
unique  for  item  6,  which  we  have  chosen  as  an  example  more  or  less 
arbitrarily.  In  fact,  for  the  interval  of  6  ,  (-2.2,  2.1)  ,  the 
three  curves  are  practically  identical  for  each  of  the  other 
nine  binary  test  items,  although  outside  of  this  interval  of  6 
there  are  some  discrepancies. 

The  above  results  indicate  the  high  success  of  using  either 
one  of  the  three  methods,  i.e.,  Two-Parameter  Beta  Method,  Normal 
Approach  Method  and  Pearson  System  Method,  in  approximating  the 
conditional  density  function,  ♦(t|ts)  .  We  have  investigated  the 
fitness  of  these  curves  further,  some  results  of  which  are 
illustrated  in  Figures  5-13-4  through  5-13-6. 

Figure  5-13-4  presents  the  regression,  E[e|e]  ,  of  ability 
6  on  its  maximum  likelihood  estimate  6  ,  which  is  based  upon 


FIGURE  5-13-4 


Rogr.aaion  of  Ability  6  on  It«  Huina  Ukalihood  Eitbut*  S 
Baaod  Upon  th«  Original  Old  T**t. 


-77- 


V-32 


the  original  Old  Test,  with  the  intervals  of  the  standard  error, 
[Var .  (6 1 9)  ]1/>2  ,  on  each  side,  by  dots.  These  values  were  obtained 
by 


(5.50) 


E(t | t) 


t  4>(t|t)  dt 


and 


(5.51) 


Var. (t I  t) 


[t-E(t |t)  ]  4>(t|t)  dr 


where  <Kt|t)  is  defined  by  (5.48),  with  the  replacement  of  t  by 

0  ,  and  x  by  6  .  In  the  same  figure,  also  presented  by 
s 

dashed  and  solid  lines  are  the  corresponding  estimates  in  Degree  3 
and  4  Cases.  We  can  see  that  these  three  sets  of  curves  are 
practically  identical  for  the  interval  of  6  ,  (-2.2,  2.0)  ,  and 
then  divert  from  one  another  outside  of  this  interval.  This  result 
proves  a  high  accuracy  in  the  estimation  of  the  first  and  the 
second  conditional  moments  of  6  ,  given  §  ,  which  was  done  by 
(3.22)  and  (3.23),  using  the  polynomial  obtained  by  the  method  of 
moments  as  the  estimated  density  function,  g(0)  ,  in  both  Degree 
3  and  4  Cases.  The  differences  between  the  two  cases  in  the 
diversion  from  the  true  regression  outside  of  the  interval  of  6  , 
(-2.2,  2.0)  ,  are  due  to  the  differences  between  the  two 
polynomials  around  these  two  areas,  which  are  shown  in  Figure  4-1-2, 

From  the  result  shown  in  Figure  5-13-4,  we  can  expect  that 
the  fitness  of  $(8|§  )  to  4> ( 0 1 §  )  should  be  better  for  the 
interval  of  6  ,  (-2.2,  2.0)  ,  than  for  the  range  of  9  outside 
of  this  interval.  Figures  5-13-5  and  5-13~6  present  two  (examples 
of  the  fitnesses  of  the  estimated  conditional  density  functions  to 
the  true  density  function,  <t> ( ©  1 6  )  .  These  two  sets  of  results 
are  for  s  ■>  50  and  s  -  500  ,  whose  maximum  likelihood  estimates, 

0  ,  are  -0.0066  and  2.6346  ,  respectively.  In  both  figures, 
the  theoretical  density,  <}> ( 6  j  0  )  ,  is  drawn  by  a  solid  curve,  and 
the  estimated  density  functions,  $(e|9  )  ,  obtained  by  the  Normal 


2J0 


sjo  r 


FIGURE  5-13-5 


Conditional  Density  of  6  ,  Given  8  (Solid  Curve)  end  Its  Estimates  by  the 
Noraal  Approach  Method  (Dotted  Curve)  and  by  the  Two-Pareneter  Bata  Method 
(Long  Dashed  Curve) ,  for  Degree  3  Case  (Left)  and  Degree  4  Case  (Right)  , 
Based  upon  the  Original  Old  Teat.  8  •  850  ■  -0.0066  . 


Approach  Method  and  the  Two-Parameter  Beta  Method,  are  plotted  by 
short  and  long  dashes ,  respectively,  in  each  of  the  Degree  3  and  4 
Cases.  In  Figure  5-13-5,  we  can  see  that  $(9 |0  )  ,  which  is 
obtained  by  the  Normal  Approach  Method,  is  practically  identical 
with  the  theoretical  density  function,  while  the  one  obtained  by 
the  Two-Parameter  Beta  Method  is  somewhat  different,  in  each  of 
Degree  3  and  4  Cases.  In  this  example,  Pearson  System  Method 
directs  us  to  the  normal  distribution  in  Degree  3  Case,  and  to  the 
Type  II  Beta  distribution  in  Degree  4  Case.  -The  normal  density 
curve  in  the  left  hand  side  graph  of  Figure  5-13-5,  therefore,  is 
also  the  result  obtained  by  the  Pearson  System  Method,  and  the  one 
in  the  right  hand  side  graph  is  practically  identical  with  the-  one 
obtained  by  the  Pearson  System  Method  *  2.999)  •.  We  can  also 
see  in  Figure  5-13-5  that  the  two  sets  of  results  obtained  for 


FIGURE  5-13-6 

Ccoditioael  Density  of  6  ,  Given  i  (Solid  Curve)  and  Its  Estimate*  by  the 
Monaal  Approach  Method  (Dotted  Curve)  and  by  the  Two-Par aaater  Beta  Method 
(Long  Dashed  Curve) ,  for  Degree  3  Case  (Left)  and  Degree  4  Oman  (Eight) , 

Baaed  upon  tbe  Original  Old  Test.  6  -  flsoo  “  2.6346  . 

Degree  3  and  A  Cases  are  very  close  to  each  other. 

In  contrast  to  this.  Figure  5-13-6  shows  lower  degrees  of 

fitness  of  $(ele  )  to  its  theoretical  counterpart,  <Kel0s)  *  in 

both  Degree  3  and  A  Cases.  The  departure  from  the  theoretical 

density  function  is  greater  for  Degree  3  Case  in  both  results 

obtained  by  the  Two-Parameter  Beta  Method  and  Normal  Approach 

Method,  which  is  anticipated  from  the  greater  diversion  of  the 

estimated  regression  of  6  on  6  from  the  true  regression  in 

Degree  3  Case,  as  we  have  seen  in  Figure  5-13-A.  Pearson  System 

Method  directs  us  to  the  Type  I  Beta  distribution  (k  -  -0.010, 

0  -  0.000,  0_  -  2.990)  in  Degree  3  Case,  and  the  distribution 

1 

is  undefined  in  Degree  A  Case. 

We  have  sampled  A2  examinees  out  of  A93,  and  observed  the 
fitnesses  of  the  estimated  density  functions  to  the  true  ones  (cf. 


-80- 


V-35 


RR-78-2) .  As  is  expected  from  Figure  5-13-4,  in  most  cases  the 
results  turned  out  to  be  similar  to  the  one  for  s  ■  50  ,  which  we 

A 

have  seen  in  Figure  5-13-5,  and  in  a  few  cases  in  which  8g  lies 
outside  of  the  interval,  (-2.2,  2.0)  ,  the  results  were  similar  to 
the  one  for  s  ■  500  ,  which  we  have  observed  in  Figure  5-13-6. 

Weighted  Sum  Procedure  is  an  expansion  of  the  Simple  Sum 

Procedure,  in  which  the  estimated  operating  characteristic,  P  (0), 

*h 

of  the  item  response  x^  can  be  written  as 


(5.52) 


N  l 

P  ■  2  w(t  )  $<t|t  )[  £  w(t  )$(t|t  )]  , 

*h  sexh  8  6  s-1  S  8 

xh"0’l’“-”mh  * 


where  w(t  )  is  an  appropriate  weight  assigned  to  the  maximum 
s 

Likelihood  estimate  rg  for  the  individual  examinees.  Simple 
Sum  Procedure  can  be  considered,  therefore,  as  a  special  case  of 

the  Weighted  Sum  Procedure,  in  which  w(x  )  ■  1  for  all  the 

8 

individual  examinees. 

Figure  5-13-7  presents  the  estimated  density  functions  of 

ability  e  ,  which  is  divided  into  two  portions  for  the  success 

and  failure  subpopulations  for  item  6,  respectively,  as  the  results 

of  the  Weighted  Sum  Procedure  of  the  Conditional  P.D.F.  Approach, 

which  is  combined  with  the  Two-Parameter  Beta  Method.  These 

results  were  obtained  upon  the  original  Old  Test,  using  the  area 

under  the  curve  of  g(§)  for  the  oubinterval  of  6  which  is 

taken  from  the  midway  between  each  0  and  the  lover  adjacent 

value  of  e  and  ends  with  the  midpoint  between  0  and  the  upper 

adjacent  value  of  §  .  The  result  of  Degree  3  Case  is  plotted  by 

8 

dots  and  the  one  obtained  by  Degree  4  Case  is  drawn  by  dashes,  in 
each  of  the  two  graphs  of  Figure  5-13-7.  In  this  figure,  the 
theoretical  portions  of  the  density  of  ability  6  are  drawn  by 
solid  curves,  the  actual  frequencies  of  0  by  solid  lines  with 


-81- 


V-36 


FIGURE  5-13-7 

EatlMtad  Density  Function*  of  Ability  0  Divided  into  Two  Portion*  for  tha  Succaaa 
and  tha  Faj lure  Subpopulationa  for  Item  6  ,  Obtalnad  by  the  Weighted  Sub  Procedure 
of  tha  Conditional  P.D.F.  Approach  with  the  Two-Paraneter  Bata  Method,  in  Degree  3 
(Dotted  Curve*)  and  A  (Daehad  Curve*)  Caeea  ,  in  Coaparluon  with  the  Thaoretical 
Portion*  of  the  Denaity  Function  (Solid  Curvea)  ,  tha  Actual  Frequencies  of'  6 
(Solid  Unea  with  Dlmooda),  and  the  Portion*  forth*  Criterion  Itew  Charactariatit 
Function  in  the  Staple  Sub  Procedure  (Solid  Curvea  with  Croaaaa) . 

diamonds,  and  the  functions  which  are  the  basis  of  the  criterion  item 

characteristic  function  in  the  Simple  Sum  Procedure  are  shown  by 

solid  curves  with  crosses,  respectively.  We  can  see  in  this  result 

that  the  estimated  ability  distributions  are  snore  deviated  from  the 

true  ability  distributions  in  Degree  3  Case,  in  comparison  with 

those  of  Degree  4  Case.  This  is  not  only  true  with  item  6  but  is 

common  among  the  results  obtained  for  the  other  nine  binary  test 

items,  and  also  among  those  obtained  by  using  the  Pearson  System 

Method  instead  of  the  Two-Parameter  Beta  Method.  This  diversion 

is  due  to  the  fact  that  we  used  the  areas  under  the  estimated 

density  function,  g(0)  ,  as  the  weight,  w(0  )  ,  and  the 

6 

discrepancies  of  g(§)  from  the  true  density  function  in  Degree  3 
Case  are  greater  than  the  one  in  Degree  4  Case,  as  we  have  seen  in 


-82- 


V-37 


Figure  4-1-2. 

Figures  5-13-8  and  5-13-9  present  the  resultant  estimated 
item  characteristic  functions  of  item  6,  in  Degree  3  Case  by  dotted 
curves  and  in  Degree  4  Case  by  long,  dashed  curves,  which  were 
obtained  by  the  Pearson  System  Method  and  the  Two-Parameter  Beta 
Method,  respectively.  In  these  figures,  also  presented  are  the 
theoretical  item  characteristic  function  of  item  6,  the  proportions 
correct  of  0  ,  and  the  criterion  item  characteristic  function 
obtained  by  the  Simple  Sum  Procedure,  by  solid  curves,  solid  lines 
with  diamonds,  and  solid  curves  with  crosses,  respectively.  We 
can  see  in  these  two  figures  that  the  results  obtained  in  Degree  3 
and  4  Cases  are  practically  identical,  in  spite  of  the  differences 
between  the  two  sets  of  estimated  portions  of  the  density  function 
of  0  ,  as  we  have  seen  in  Figure  5-13-7.  This  turned  out  to  be 


MkTKMT  TRAIT  • 

•4 

FIGURE  5-13-8 

estimated  Item  Characteristic  Tunctiooa  of  Itaa  6  1m  Degree  3  (Dotted  Curve)  an*l  4 
(tong.  Dashed  Curve)  Cases,  Obtained  by  the  Weighted  Sun  Procedure  of  the  Conditional 
F.D.F.  Approach  with  the  Pearsoo,  System  Method,  In  Comparison  with  tUa  Thaoratical 
Item  Characteristic  Function  (Solid  Curve),  the  Frequency  Ratios  of  6  (Solid  Lina 
with  Diamonds),  and  the  Criterion  Item  Characteristic  Function  In  thi  Simple  Sum 
Procedure  (Solid  Curve  with  Croaaaa) . 


LATIN T  TRAIT  • 

FIGURE  5-13-9 

ttMtliutwl  ItMt  Ch*r»ct»rl»tlc  function*  of  Itaa  6  In  Dagra*  3  (Dnttad  Curva)  and  A 
(Long ,  Dan had  Curva)  Ca««« ,  Obtained  by  tha  Waightad  Sum  f rocadura  of  tHa  Conditional 
V.P.P,  Approach  with  tha  Two-Paranatar  Bata  Hat hod,  la  Cowpariaon  with  tha  Thaoratlcal 
Itaai  Characteristic  function  (Solid  Curva),  tha  frequency  lUtlos  of  ®  (Solid  l.ina 
with  Diamond*) ,  and  tha  Criterion  Itan  Characteristic  function  in  tha  8 lap  la  Sum 
Procedure  (Solid  Curva  with  Crosses). 

true  with  every  binary  test  item  for  the  interval  of  9  , 

(-2.2,  2.2)  ,  in.  both  results  obtained  by  the  Pearaon  System'Method 
and  by  the  Two-Parameter  Beta  Method.  We  also  notice  that  these  two 
seta  of  results  obtained  by  the  two  different  methods  are  very 
close  to  each  other  for  this  range  of  6  ,  and,  again,  this  is 
true  with  all  the  other  nine  binary  test  items.  There  are  some 
discrepancies  between  these  results  and  the  criterion  item 


characteristic  function  obtained  by  the  Simple  Sum  Procedure,  f 

however.  Since  the  estimated  item  characteristic  function  | 

obtained  by  the  Simple  Sum  Procedure  with  either  one  of  the  three  j 

methods,  i.fc.,  Pearson  System  Method,  Two-Parameter  Beta  Method  | 

and  the  Normal  Approach  Method,  is  practically  identical  with  the  f 

i 

corresponding  criterion  item  characteristic  functions  for  each  of  j 

the  ten  binary  test  items,  as  we  have  observed  earlier  in  this  f 


-84- 


V-39 


section,  the  above  discrepancies  also  exist  between  the  set  of 
estimated  item  characteristic  functions  obtained  by  the  Weighted 
Sum  Procedure  and  the  one  obtained  by  the  Simple  Sum  Procedure. 

In  this  example  of  item  6,  we  can  see  that  the  results  obtained  by 
the  Simple  Sum  Procedure  fit  better  to  the  true  item  characteristic 
function,  than  those  obtained  by  the  Weighted  Sum  Procedure.  This 
fact  cannot  be  generalized  to  all  the  other  nine  binary  test  items, 
however.  For  instance,  for  item  10,  the  results  indicate  that  this 
order  is  reversed. 


If  we  replace  $(t|t  )  in  (5.52)'  by  its  theoretical 

8 

counterpart,  ,  which  is  given  by  (5.48),  we  obtain  a  kind 

of  criterion  operating  characteristic  in  the  Weighted  Sum  Procedure. 

Since  we  still  use  the  weight  obtained  from  g(’§)  in  our  example, 
we  shall  call  it  pseudo-criterion  item  characteristic  function. 

Actually,  we  can  obtain  more  than  one  such  functions,  depending 
upon  the  approximations  used  for  g(6)  .  We  obtained  three  pseudo¬ 
criterion  item  characteristic  functions  for  each  of  the  ten  binary 
test  items,  using  the  three  polynomials  of  degrees  3,  4  and  5,  which 
were  obtained  by  the  method  of  moments  and  are  illustrated  in 
Figure  4-1-2.  These  three  pseudo-criterion  item  characteristic  functions 
turned  out  to  be  very  close  to  the  two  estimated  item  characteristic 
functions  of  Degree  3  and  4  Cases  for  each  of  the  ten  binary  test 
items,  the  result  which  supports  the  usefulness  of  the  three 

■  A 

different  methods  of  approximating  the  conditional  density,  4>(t|t  ). 

s 

Proportioned  Sum  Procedure  has  a  somewhat  different  rationale 

from  those  for  the  other  two  procedures.  Let  p(sex^)  be  the 

probability  with  which  the  examinee  s  belongs  to  the  subpopulation 

x^  .  We  have  for  the  estimated  operating  characteristics,  P  (6)  , 

*h 

of  the  item  response  x^  to  item  h  . 


N 


N 


,-l 


(5.53)  P  (e)  -  I  p(sexh)  $(t|t  )  [  E  $(t|t  )]' 

n  s_l  e-1 

xh  -  0,1, ...  ,1^ 


iii te 


where  p(sex^)  is  the  estimate  of  the  probability  p(aex^)  ,  which 
satisfies 


(5.54) 


\  »<,eV 
v° 


S"  P<BE«.  ) 

v° 


1  . 


Figure  5-13-10  presents  the  four  different  estimates  of 
p(sex^)  for  item  6,  which  were  used  in  the  present  study.  Our 
basic  data  are,  again,  the  set  of  five  hundred  maximum  likelihood 
estimates  obtained  upon  the  original  Old  Test.  These  four 
estimates  of  p(sex^)  are  the  proportions  of  the  examinees  who 
belong  to  the  subpopulation  x^  within  a  more  or  less  arbitrarily 
chosen  interval  of  0  .  The  first  and  second  estimates,  which  are 
plotted  by  solid  triangles  and  crosses,  respectively,  in 


FIGURE  5-13-10 

Four  Different  Ittlattu  o f  pCat^-i)  for  Itaa  6  ,  l.*.,  the  Proportions  of  the 

Exes inset  Who  Answered  Correctly  to  ItMi  6  within  the  Interval  t  o  (Solid 

Trienglee) ,  Thoee  within  the  Intervel  1  2o  (Croeeee) ,  end  the  Corresponding 

Result*  for  Which  the  61  gquelly  Speced  Veluee  of  8  Wer<»  Seed  Ineteed  of  the 
300  Value*  of  h*  (Dots  end  Deehee,  Respectively) . 


-86- 


V-41 


Figure  5-13-10,  are  the  proportions  of  the  examinees  who  belong  to 

A  A 

the  subpopulation  x«l  within  the  intervals,  6  to  and  6  ±  2o  , 

n  sis 

respectively,  where  a  *  0.215  .  The  third  and  fourth  ones,  which 
are  drawn  by  dots  and  dashes,  respectively,  are  the  same  as  the  first 
two,  but  are  assigned  to  the  sixty-one  equally  spaced  values  of  0  , 
instead  of  the  five  hundred  observations,  ’•  .  We  notice  that 
these  proportions  themselves  can  be  crude  estimates  of  the  operating 
characteristic  of  x^  ,  if  ve  correct  the  scale  of  B  using  tbe 
method  suggested  in  Section  V.2  .  With  oujr  data,  the  ratio 
of  the  standard  deviation  of  6  to  that  of  6  is  only  1.011 
(cf.  RR-78-5)  and  the  regression  of  0  on  0  la  approximately 
linear  for  the  interval  of  6  ,  (-2.2.  2.0)  (cf.  Figure  5-11-4). 

For  these  reasons,  the  item  characteristic  function  of  Item  6  is 
drawn  without  correction  in  Figure  5-13-10,  for  a  rough  comparison. 

Figures  5-13-11  and  5-13-12  present  the  resultant  estimated 
item  characteristic  functions  of  item  6  obtained  by  the  Proportioned 
Sum  Procedure  which  is  combined  with  the  Pearson  System  Hethod  and 
the  Two-Parameter  Beta  Method,  respectively,  using  the  first  two 
p(sex^)  's  ,  for  Degree  3  and  4  Cases.  In  these  figures,  the 
results  obtained  by  using  the  first  and  second  p(scx^>l)  *s  for 
Degree  3  Case  arc  plotted  by  dots  aud  medium  dashes,  and  those  for 
Degree  4  Case  are  drawn  by  short  and  long  dashes,  respectively, 
together  with  the  theoretical  item  characteristic  function  of  item 
6,  the  proportions  correct  of  0  ,  and  the  criterion  item 
characteristic  function  obtained  by  the  Simple  Sum  Procedure,  which 
are  drawn  by  solid  curves,  lines  with  diamonds,  and  curves  with 
crosses,  respectively.  We  can  see  in  each  of  these  two  figures 
that  the  four  results  are  very  close  to  each  other,  and  also  to  the 
criterion  item  characteristic  function  obtained  by  the  Simple  Sum 
Procedure,  for  the  interval  of  0  ,  (-2.5,  2.5)  .  This  is  a  common 
tendency  among  all  the  ten  binary  test  items,  although  for  some 
Items  they  are  not  as  close  as  those  for  item  6.  It  is  also  noted 
that  these  two  sets  of  results  obtained  by  the  Pearson  System 
Method  and  by  the  Two-Parameter  Beta  Method  are  very  close  to  each 


-40  -3j0  -Z0  -tX)  00  tO  2X>  3X>  40 


(  LATSNT  TRAIT  <4 

i 

f  FIGURE  5-13-11 


**tl»*t*d  Itm  Ch*r»ct»ri»tic  function*  of  It  a  6  Obtained  by  tha  Proportioned  Sum  ' 

Proemdut*  of  the  Cooditiooal  P.D.T,  Approach  with  tha  Paaraon  Syatao  Method  ,  by  J 

Being  tha  Proportion*  for  Mg  In  Degree  3  (Dot*)  and  4  (Short  Daahe*)  C«*e*  , 

and  by  Da  in*  Tho**  for  Sg  i  2o  In  Degree  3  (Medium  Daahe*)  and  4  (tong  Daahe*) 

Ca*aa,  Keapectlvely .  They  Are  Conpared  with  th*  Theoretical  It**  Charact aria tic 

function  (Solid  Curve)  ,  th*  frequency  Ratio*  of  8  (Solid  tine  with  Diamond*) ,  ' 

end  th*  Criterion  Item  Characteristic  Function  In  the  Simple  Sum  Procedure 

(Solid  Curve  with  Croaaea) . 

1 

other.  This  tendency  Is  common  to  all  the  ten  binary  test  items.  j 

If  we  replace  ♦(t|t  )  in  (5.53)  by  the  true  density,  j 

♦<t|t  )  ,  we  can  obtain  the  pseudo-criterion  operating  j 

characteristic  of  .  In  the  present  study,  four  different 

pseudo-criterion  item  characteristic  functions  were  obtained,  1 

using  the  four  different  estimates  of  p(sex^-l)  ,  which  we  have 
observed  in  Figure  5-13-10.  The  resultant  pseudo-criterion  item 
characteristic  functions  turned  out  to  be  very  close  to  the  j 

estimated  item  characteristic  functions  obtained  by  using  the  same 
p(sex^«l)  ,  for  each  of  the  ten  binary  test  items,  the  fact  which 
supports  the  usefulness  of  both  Pearson  System  Method  and 

Two-Parameter  Beta  Method.  j 

! 

The  estimated  ability  distributions  for  the  success  and  the  ) 


-88- 


V-43 


LATENT  TJUIT  • 

FIGURE  5-13-12 

Kstlaatsd  Item  Characteristic  Function!  of  I  tan  6  Obtained  by  the  Proportioned  Sun 
Procedure  of  the  Conditional  P.D.F.  Approach  with  the  Two -Parameter  Bate  Method,  by 
Using  the  Proportions  for  6^  1  0  In  Degree  3  (Dots)  end  A  (Short  Dashes)  Cases, 

and  by  Using  Those  for  1  2a  In  Degree  3  (Medium  Dashes)  and  4  (Long  Dashes) 

Cases,  Respectively.  They  Are  Compared  with  the  Theoretical  Item  Characteristic 
Function  (Solid  Curve) ,  The  Frequency  Ratios  of  6  (Solid  Line  with  Diamonds  ) , 
and  the  Criterion  Item  Characteristic  Function  In  the  Simple  Sum  Procedure 
(Solid  Curve  with  Crosses) . 


failure  subpopulations  for  each  Item  turned  out  to  be  very  similar  , 

to  those  obtained  by  the  other  combinations  of  an  approach  and  a  ] 

method,  for  both  Degree  3  and  4  Cases.  j 

Figure  5-13-13  presents  the  estimated  density  functions  for 
the  total  population,  which  were  obtained  by  the  Two-Parameter  Beta  j 

Method,  using  6g  ±  0.215  as  the  inter  al  for  computing  {Ksex^-l)  ,  |j 

in  Degree  3  and  4  cases,  by  dotted  and  dashed  curves,  respectively,  -j 

together  with  the  theoretical  density,  f(6)  ..  We  can  see  in  this  jj 

figure  that  these  two  results  are  close  to  each  other,  and  reasonably  j 

close  to  the  uniform  density.  The  corresponding  results  obtained  by  | 

a  | 

using  the  interval,  6g  ±  0.430  ,  turned  out  to  be  very  close  to  j 

these  results.  ,  J 


-89- 


V-4A 


Estimated  Density  Function*  of  Ability  6  ,  Obtained  by  the  Proportioned 
Sun  Procedure  of  the  Conditional  P.D.F.  Approach  with  the  Two-Parameter 

Beta  Method,  by  Using  the  Proportions  for  the  Interval,  §  ±  o  ,  In 

6 

Degree  3  (Dote)  and  Degree  A  (Short  Dashes)  Cases,  in  Comparison  with 
the  Theoretical  Density  Function. 

We  have  also  obtained  the  corresponding  four  estimated  density 
functions  by  the  Pearson  System  Method.  The  results  turned  out  to  be 
fairly  close  to  those  obtained  by  the  Two-Parameter  Beta  Method.  In 
fact,  all  the  other  results  obtained  by  the  othi  approaches  turned 
out  to  be  similar,  with  some  deviations,  i.e.,  some  of  them  are  a 
little  closer  to  the  theoretical  density  function,  and  some  of  them 
are  a  little  less  close. 

(V .  14",  Remark  on  the  Approximation  of  4>(x|t)  by  a  Normal 
Density  Function 

We  have  seen  in  the  previous  sections  that,  in  spite  of  its 
relatively  restricted  shape  of  the  normal  density  dunction,  mal 
Approach  Method  works  just  as  well  as  the  other  two  methods,  i.e., 
Pearson  System  Method  and  Two-Parameter  Beta  Method ,  in  approximating 
the  conditional  density  function,  4>(t|?)  .  There  is  a  good  reason 
behind  this  fact,  which  we  shall  observe  in  this  section. 

Suppose  that  the  density  function,  f*(t)  ,  is  uniform  for 
a  certain  interval  of  T  ,  [t,t]  .  Then  we  can  write 


-90- 


V-45 


(5.55)  <t>(T|f)  -  iK^|t)  f*(T)  l  I  4i(f|x)  f*(T)  dt]"1 

-  iji(f  K)  [  j  dt]"1  ,  for  t  <  T  <  T  . 

Since  we  have 


(5.56)  <Kt|T)  “  (2tt)"1/2  I”1  exp[(t|T)2/(2c2)] 

-  (2tt)"1/2  o*1  exp[(T|f)2/(2o2)]  , 


from  this  and  (5.55)  ,  we  find  that  4>(t|?)  is  a  truncated  normal 
density  function.  Whan  a  is  small,  for  a  wide  range  of  t  ,  this 
is  practically  equal  to  the  complete  normal  density  function,  which 
is  given  by  the  rightest-hand  side  of  (5.56).  Normal  Approach  Method, 
therefore,  must  work  wall  in  this  situation. 

If  the  marginal  density,  f*(x)  ,  is  a  normal  density  function 

with  y  and  5  as  its  two  parameters,  then  the  joint  distribution 

of  x  and  ?  will  be  the  bivariate  normal  distribution,  with  y  and 
2  2  1/2 

(a  +5  )  as  the  two  parameters  for  the  marginal  density  function, 
g(?)  ,  and 

(5.57)  P  -  ;(o2+c2)”1/2 


as  the  fifth  parameter.  Thus  the  conditional  density,  <t>(T[f)  * 
is  a  normal  density  function,  with  (C2?+02y) (c2+C2)  *  and 
cc(02+i;2)~1^2  as  the  two  parameters. 

These  two  facts  indicate  that,  if  the  distribution  of  T  is 
close  to  either  a  normal  distribution  or  a  uniform  distribution,  or 
between  the  two.  Normal  Approach  Method  will  work  well  in 
approximating  the  conditional  density  function,  ^(t|?)  . 


-91- 


V-46 


REFERENCES 


,«  Elder  ton,  W.  E.  H.  L.  M-JBBI  &  *«««  " 

Cambridge  University  Press,  1969. 

~tt.V:2£  zsssttgrs^  “ 

Pavchometrlka,  42,  1977,  pages  163-191. 


-92- 


VI-1 


VI  Estimation  of  the  Operating  Characteristics  of  the  Discrete 
Item  Responses  and  That  of  Ability  Distributions;  III 

Following  Chapters  3  and  5,  in  the  present  chapter,  we  shall 
integrate  the  results  and  findings  of  the  part  of  our  research  under 
this  title.  It  also  includes  certain  observations  about  tests  in 
general,  objective  testing,  ethics  behind  Bayesian  estimation,  and 
some  other  related  topics.  The  main  subject  in  the  present  chapter 
is  to  find  out  how  small  the  number  of  test  ItemB  can  be  in  our  Old 
Test.  Alternative  estimators  for  the  maximum  likelihood  estimator  will 
be  Introduced,  which  can  be  used  when  the  amount  of  test  Information  of 
our  Old  Test  is  not  large  enough  for  the  entire  range  of  ability  6  of 
our  interest,  and,  consequently,  there  exist  more  than  a  few  positive 
and/or  negative  infinities  for  the  maximum  likelihood  estimates  of 
ability  of  our  examinees. 

(VI . 1)  Obj ective  Testing  and  Exchangeabil 

Equal  opportunities  have  been  considered  to  be  ethical  in  our 
society.  In  personnel  selection,  for  example,  we  are  supposed  to 
make  our  decisions  which  are  based  upon  the  applicants'  capabilities, 
but  not  upon  their  ethnic  backgrounds,  ages,  sexes,  and  other 
attributes  which  have  little  to  do  with  their  capabilities  for  a 
specified  job.  The  translation  of  this  equal  opportunity  principle 
to  testing  will  be  that  we  should:  1)  develop  and  use  valid  tests  for 
the  selection  purpose;  2)  objectively  analyze  the  results  of  the 
tests;  and  3)  make  our  recommendations  as  to  which  applicants  should 
be  accepted  and  which  should  be  rejected  on  the  basis  of  these 
objective  findings  only. 

Although  the  above  first  and  third  statements  are  readily 
accepted  by  people  in  general.  Including  researchers,  for  some  reason 
the  second  statement  has  attracted  little  attention.  Note,  however, 
that  this  is  the  part  that  researchers  should  be  most  responsible  for. 

Bayesian  estimation  of  ability  has  been  accepted  for  many  years 
as  a  valid  method  by  researchers.  This  fact  does  not  justify,  however, 
certain  serious  flaws  Bayesian  estimation  has,  which  are  clearly 


VI-2 


-93- 


Against  tba  principle  of  objective  testing.  It  assunea  the 
exchangeability  of  individuals  who  belong  to  a  certain  eubpopulation , 
and  uses  the  ability  distribution  of  the  subpopulaticn  as  the  prior. 

Let  ue  assume  that  we  have  two  ethnic  groups,  A  and  B  . 
Figure  6-1-1  presents  the  priors  of  these  two  hypothetical  ethnic 
subpopulations  with  respect  to  ability  6  .  The  basic  idea  behind 
the  Bayesian  estimation  is  that,  within  each  ethnic  subpopulation, 

A  or  B  ,  the  individuals  are  exchangeable.  Are  they  really? 
Suppose  that  we  fix  the  level  of  6  at  6Q  ,  as  is  indicated  in 
Figure  6-1-1.  If  we  consider  the  subset  of  individuals  whose  ability 


FIGURE  6-1-1 

Density  Function  of  tha  Ability  Dlatributlona  of  Two  Hypothetical  Ethnic 

Creep* ,  A  and  B  . 


levels  are  uniformly  ,  they  will  include  certain  people  from  the 
ethnic  group  A  ,  and  also  certain  other  people  who  belong  to  B  . 

Our  best  cannon  sense  tells  us  that  these  individuals  are  the  people 
who  are  exchangeable.  In  the  Bayesien  estimation,  however,  they  are 
not;  in  its  logic,  those  who  belong  to  A  are  exchangeable  among 
themselves,  and  so  are  those  who  belong  to  B  . 

In  order  to  observe  this  issue  from  a  somewhat  different  angle; , 


-94- 


VI-3 


we  shall  consider  a  real  test,  LIS-U  (Indow  and  Samejima,  1962, 
1966;  Samejima,  1969,  RR-80-3) .  Figure  6-1-2  presents  the  test 
information  function  and  its  square  root  of  LIS-U  by  solid  and 


LATBtT  TRAIT  0 

FIGURE  6-1-2 

T«at  Information  Function  (Solid  Lint)  and  It»  Square  toot  (Sotted  Line)  of  LIS-U. 

dotted  curves,  respectively.  The  test  consists  of  seven  binary 
items,  which  make  a  fairly  short  test.  Bayes  modal  estimator 
(Saaejima,  1969),  ,  of  ability  6  is  the  modal  point  of  6  for 

the  function  By(fl)  such  that 

(6.1)  Bv(6)  -  Pv<9)  f(0)  . 

This  estimator  was  adopted  as  the  estimator  of  ability  6  ,  using 

the  density  function  of  the  ability  distribution,  f(6)  ,  as  the 

prior,  for  each  of  the  27  -  128  response  patterns.  The  regression 
a 

of  6  on  ability  6  is  given  by 

E[§|e]  -  s  8V  Pv(e)  . 


(6.2) 


-95 


VI-4 


Figure  6-1-3  presents  the  four  regressions  of  6  on  6  using  the 
four  different  priors,  n(0. 0,1.0),  n(-l. 0,1.0),  n(l. 0,1.0)  and 
n(0,0,0.5)  ,  by  solid,  long  dashed,  short  dashed  and  dotted  curves, 
respectively.  Let  us  assume  that  the  second  prior  is  for  the 


LATENT  TRAIT  tt 

FIGURE  6-1-3 


Tour  kiirutloni  of  tbs  Uyu  Modal  latlmata  oo  Ability  Mart  on 
LIS-t)  ,  with  th*  Trior* ,  u(0. 0,1.0)  (Solid  Mae)  ,  n(-l. 0,1.0) 
(U»s,  D*»h«d  Um),  a(l.O.l.O)  (Short,  Dm  had  Um)  ,  and 
»(0.0,0.5)  (Dottad  tine)  ,  Mapactivaly. 


ethnic  group  A  ,  and  the  third  prior  is  for  the  ethnic  group  B  ,  and 
6^  ■  2.0  .  We  can  see  in  this  figure  that,  for  two  individuals,  whose 
ability  levels  are  uniformly  6^  but  belong  to  the  ethnic  groups  A 
and  B  ,  respectively,  the  distributions  of  the  Bayes  modal  estimate, 

6  ,  are  different,  and  their  expectations  are  approximately  1.0  and 
1.6  ,  respectively  —  a  substantial  difference! 


Let  us  assume,  further,  that  the  first  and  the  fourth  priors 
are  for  man  and  women,  respectively.  If  the  first  individual  of  the 
above  two  happens  to  be  male,  then,  using  n(0. 0,1.0)  as  the  prior, 
his  expected  Bayes  modal  estimate  is  approximately  1.3  .  Which 
should  we  take  for  this  first  individual,  1.0  or  1.3  ,  as  the 


-96- 


VI-5 


expected  value  of  his  ability  estimate?  This  Individual  will  obtain 
a  higher  score  if  the  prior  for  men  is  used  than  if  that  of  the 
ethnic  group  A  is  used.  Perhaps  he  would  rather  be  treated  as  a 
man  than  as  a  member  of  the  ethnic  group  A  .  If  the  second 
individual  happens  to  be  female,  then  her  expected  Bayes  modal 
estimate  becomes  approximately  0.7  .  Again,  there  are  two  expected 
values  for  her,  1.6  and  0.7  ,  and  she  will  obtain  a  higher  score 
if  she  is  categorized  as  a  member  of  the  ethnic  group  B  rather 
than  as  a  woman.  If  we  use  the  second  priors  for  the  two 
individuals,  the  expected  Bayes  modal  estimates  are  1.3  and  0.7  , 
i.e. ,  the  reversal  of  the  order  from  that  of  1.0  and  1.6  !  Thus, 
if  we  take  the  first  set  of  priors  in  selection,  then  we  will  be 
saying,  "If  there  are  two  people  whose  ability  levels  are  exactly 
the  same  and  at  2.0  ,  then  we  will  accept  the  one  from  the  ethnic 
group  B  ."  If  we  take  the  second  set  of  priors,  then  we  will  be 
saying,  "If  there  are  two  ouch  individuals  who  happen  to  be  male  and 
female,  then  we  will  accept  the  man  and  reject  the  woman."  We  will 
be  very  likely  to  accept  the  second  individual  if  we  take  the  first 
set  of  priors,  and,  if  we  take  the  second  set,  then  it  will  be  highly 
probable  that  we  accept  the  first  individual  and  reject  the  second. 
This  is  what  it  amounts  to  when  we  use  a  Bayesian  estimator  of 
ability  in  our  selection. 

A  solution  for  this  chaos  will  be  to  divide  each  ethnic  group 
further,  to  make  four  groups  instead  of  two,  i.e.,  ethnic  A  and 
male,  ethnic  B  and  male,  ethnic  A  and  female,  and  ethnic  B  and 
female.  It  should  be  noted,  however,  that  every  individual  has  much 
more  than  two  casual  attributes  like  his  or  her  ethnic  background 
and  sex,  and  similar  problems  will  happen  for  these  four  groups. 

Then  we  may  need  eight  groups  instead  of  four,  sixteen,  thirty-two, 
etc.  In  this  way,  we  will  reach,  fairly  soon,  the  conclusion  that 
each  individual  has  his  or  her  own  prior,  or  each  prior  includes  only 
one  Individual.  Then  Bayesian  estimation  may  finally  be  justifiable 
and  useful.  In  such  a  case,  however,  why  do  we  need  testing  at  all 
if  we  know  about  each  individual's  ability  so  well?  In  most  cases 


I'C' 


-97- 


VI -6 


we  do  not,  and  that  is  why  we  need  testing. 

The  flaw  of  the  Bayaaian  estimation  comes  from  the  fact  that 
it  deals  with  a  group  of  individuals  who  are  not  esch sage able  as  if 
they  were  exchangeable,  and  treats  those  who  are  exchangeable,  i.e., 
individuals  whose  ability  levels  are  exactly  the  same,  as  if  they 
were  not  exchangeable.  This  is  against  the  principle  of  objective 
testing.  It  is  a  typical  example  of  failure  in  objectively 
analyzing  the  results  of  testing. 

(VI. 2)  Every  Test  Has  a  Limitation 

We  can  see  in  Figure  6-1-3  that  for  the  values  of  ability 
8  ,  approximately,  greater  than  1.0  and  also  those,  approximately, 
less  than  -1.0  ,  there  are  little  changes  in  the  regression  of  § 
on  6  ,  for  each  of  the  four  different  priors.  In  fact,  the 
conditional  distribution  of  the  Bayes  modal  estimate,  $  ,  given  6  , 
approaches  the  one  point  distribution  at  the  modal  point  of  6  for 
the  product,  £(®)  »  as  8  tends  to  negative  infinity,  and 

it  approaches  the  one  point  distribution  at  the  modal  point  of  8 
for  the  product,  Pv-/mf,v(8)  f(8)  ,  as  8  tends  to  positive  infinity, 
where  V-min  and  V-max  indicate  the  two  extreme  response  patterns, 
(0,0,..., 0)  and  (m1 ^ *• • • •  This  means  that  for  these 
outside  ranges  of  ability  8  LIS-U  is  powerless,  and  it  is  the 
prior  that  takes  the  essential  role  in  determining  the  value  of  the 
Bayes  modal  estimate.  It  is  as  if  the  examinee  were  cheated, 
obtaining  something  other  than  the  information  the  test  itself  has 
provided . 

We  must  accept  the  fact  that  every  test  has  a  limitation  as 
to  the  range  of  ability  which  it  can  measure.  Escaping  to  priors  will 
by  no  means  enhance  this  range,  but  will  impose  the  bias  which  was 
described  in  the  preceding  section.  No  single  test  has  an  infinite 
number  of  test  items,  so  it  should  not  be  expected  that  any  test  can 
measure  an  unlimited  range  of  ability. 


.  *  i 


•  ’JfcUi-V-.v 


-98- 


VI 


(VI. 3)  Alternative  Estimators  for  the  Maximum  Likelihood  Estimator 

A  question  will  arise  as  to  whether  there  is  any  way  to 
enhance  the  range  of  ability -for  which  a  specified  test  is  powerful, 
without  depending  upon  priors  or  any  other  resources  of  irrelevant 
information.  This  can  be  done  by  replacing  negative  and  positive 
infinities  of  the  maximum  likelihood  estimates  for  the  two  extreme 
response  patterns,  V-min  and  V-max  ,  by  some  appropriate  finite 
numbers.  In  search  of  such  alternative  estimators,  our  goal  was  to 
find  suitable  substitutes  which  do  not  depend  upon  any  specific 
populations  of  examinees,  but  are  population-free,  unlike  Bayesian 
estimators. 

It  is  desirable  that  such  alternative  estimators  provide  us 
with  the  conditional  unbiasedness,  given  0  ,  as  is  the  case  with 
the  maximum  likelihood  estimator  in  the  limiting  situation  where 
we  have  infinitely  many  test  items.  We  notice  that  the  operating 
characteristic  strictly  decreases  in  e  ,  and  pV-max< 6> 

strictly  increases  in  0  ,  as  long  as  our  test  items  follow  a  model, 
or  models,  like  the  normal  ogive  and  logistic  models.  Thus  we  can 
conceive  of  a  critical  point,  ec  ,  which  satisfies 


(6.3) 


f  PV-„iu<6)  '  0 

l  *  ° 


for  9  >  6 

c 

for  0  £  0 

c 


Figure  6-3-1  presents  the  operating  characteristics  of  the 
two  extreme  response  patterns,  P„  .  (9)  end  P„  (0)  ,  of 
LIS-U  ,  by  solid  and  dotted  curves,  respectively.  The  critical 
value,  ©c  ,  was  obtained  in  such  a  way  that  the  product  of  these 
two  operating  characteristics  be  maximal  at” this  point.  It  turned 
out  to  be  -0.0088  . 

We  shall  aim  at  finding  finite  substitutes  for  the  two 

maximum  likelihood  estimates,  8„  .  and  6„  ,  which  are 

v— min  V— max 

negative  and  positive  infinites,  respectively,  in  such  n  way  that 
the  substitution  should  provide  us  with  a  regression  which  is  close 


I 

I 

I 

i 

f 

I 

! 

I 

] 


1 

] 

1 

i 

i 

I 

I 


-99- 


LATEWT  TRAIT  9 

FIGURE  6-3-1 

Operating  Characteristic!  of  the  Two  Kxtreee  geapooas  Patterns, 

(  0,0, 0,0, 0,0,0  )  (Solid  Lina)  and  (  1,1, 1,1, 1,1,1  )  (Dotted 

Line),  of  LK-C  ,  and  the  Position  of  the  Critical  Value  0  . 

c 

enough  to  6  ,  i.e.,  the  unbiasedness  of  the  estimator,  for  some 

range  of  6  .  Let  6*  and  6*  _ denote  such  estimates, 

V-min  V-max 

and  6^  be  the  resultant  estimator,  such  that 

for  V-min 


(6.4) 


9$ 


6* 

V-min 


6* 

V-max 


■  0. 


for  V-max 

for  all  the  other  response 
patterns. 


We  can  write  for  the  regression  of  0*  on  ability  6  such  that 


(6.5) 


ti(e*  6)  “2§  P  (6)  +  e*  p„  (6) 

v  V*  V-min  V  V  V"min  V_miu 

V-max 


+  0*  .  P,r  (6) 

V-max  V-max 


-100- 


*  e  e„  p  (6)  +  e*  ,  p_  ,  (e) 

V^V-min  V  V  V"min  V_ain 


V^V -max 


for  0  <  0, 


J  '  .  §v  Ve)  +  e$-MX  Pv-«ax<9) 

Vf V-min 


V^V-inax 


for  0  >  6 


If  this  estimator,  8^  ,  provides  us  with  an  approximate 
unbiasedness  for  a  certain  range  of  6  ,  (6,6)  ,  then  we  shall 

be  able  to  write 


e  e_.  p„(e)  +  e*  .  p„  .  (e>  *  e 

,  V  V  V-min  V-min 

Vf V-min 


(6.6) 


V^V-max 


for  0  <  0  £  0 


e  e„  p,r(e)  +  e*  p„  (e)  *  e 

Vjty-min  V  V  V-a*  v-*“ 

Vj<V-max 


for  0  <  0  <  0 

c 


In  practice,  we  must  search  the  interval  of  0  ,  (6,6)  ,  for 

which  such  an  estimator,  6^  ,  is  available,  in  relation  with  a 
specific  test  of  our  interest.  From  (6.6),  we  can  further  write 


(6.7) 


1  \ 

Vj*V-min 

VfV-max 


£  PV<9>  d°  +  ^ 


*  i  <92  -  92) 


.  fec  f® 

E  S„  P,T(0)  d0  +  0*  l  c  P„  (0)  de 
V-V-min  V  j  0  V  V-“  )  6 

V-V-max  c 


*  -  (e2  -  ej)  . 

I  c 


,  can  be  obtained  by 


Thus  the  two  estimates,  6*  and  6* 

*  V-oin  V-oax 


(6.8) 


/  e* 

1  V-min 


-  62)  - 


< 


\  “t-mx  *  'T<52 


6P 


z  e 
vytv-fflin  V 

VfV-max 

[ 


V/V-min 

Vjty-max 


pv<8)  de] 


I; 

v- 

( 


(e)  de 3 


-1 


Pv(e)  de) 


re  ^ 

J  ^  V-max 


(e)  de] 


-l 


with  some  appropriate  values  for  6  and  6  . 

We  used  eleven  different  sets  of  6  and  6  ,  ±1.50  ,  ±1.75  , 

±2.00  ,  ±2.25  ,  ±2.50  ,  ±3.00  ,  ±3.50  ,  ±4.00  ,  ±4.50  ,  ±5.00  , 

and  ±5.50  ,  for  the  purpose  of  experimentation.  The  resultant  set 

of  estimates,  6*  .  and  8*  ,  which  was  obtained  by  using 

each  of  these  eleven  Intervals,  is  given  in  Table  6-3-1.  Figure  6-3-2 

illustrates  the  regressions  of  on  8  ,  obtained  by  using 

(-1.5,  1.5)  and  (-2.25,  2.25)  ,  respectively,  as  (6,6)  ,  by 

solid  and  dashed  curves <  The  values  of  fl*  .  and  0* 

v-min  V-oax 

turned  out  to  be  -1.47883  and  1.52237  in  the  former  case,  and 
-1.79255  and  1.77649  in  the  latter,  as  we  can  see  in 
Table  6-3-1,  In  the  same  figure  also  presented  are  the  unbiasedness 

line,  i.e.,  the  line  which  passes  the  origin  with  the  angle  of  45 

* 

degree  from  the  abscissa,  and  the  regression  of  the  Bayes  modal 
estimate  with  the  prior,  n(0. 0,1.0)  ,  by  a  solid  line  and  a  dotted 
curve,  respectively.  We  can  see  in  this  figure  that,  within  each 
interval,  each  of  these  two  regressions  is  reasonably  close  to  the 
unbiasedness  line,  and  much  closer  than  the  regression  of  the  Bayes 
modal  estimate.  If  we  enhance  the  interval  further,  the  deviation 


aiaiW^i 


-102- 


VI-11 


""" 


TABLE  6-3-1 


tlwvan  fiata  of  btlMtu,  ftg  ^  ted  ft*  ^  , 

of  Ability  tot  tb*  Two  Extrtat  Attpoott  Tsttarna, 

(0,0 . 0)  and  (1,1,. ...1)  ,  Obtalaad  on  US-0, 

OtlAg  IltYto  Dlfftraat  Inttrvtlt  tor  (6,6)  . 


ft,  ft 

*$-eln 

•ft 

V— MX 

t  1.50 

-1.47883 

1.S2237 

t  1.75 

-1.64702 

1.65605 

4  2.00 

-1.79255 

1.77649 

4  2.25 

-1.92540 

1.89233 

1  2.50 

-2.05136 

2.00754 

4  3.00 

-2.29490 

2.24127 

4  3.50 

-2.53641 

2.48011 

4  4.00 

-2.77945 

2.72254 

4  4.50 

-3.02430 

2.96720 

4  5.00 

-3.27051 

3.21329 

4  5.50 

-3.51765 

3.46032 

from  the  unbiasedness  line  becomes  larger  (cf.  RR-80-3) .  Since  the 

least  finite  value  of  the  maximum  likelihood  .estimate  for  LXS-U  Is 

-1.3167  for  the  response  pattern,  (0,0, 0,1, 0,0,0)  ,  and  the 

greatest  finite  value  is  1.3028  for  (1,1, 1,0, 1,1,1)  ,  either  one 

of  the  above  sets  of  8*  .  and  8*  will  be  adequate,  and 

V-min  V-max 

so  Is  any  of  them  obtained  by  using  intervals  between  (-1.5,  1.5) 
and  (-2.25,  2.25)  . 


The  introduction  of  the  new  estimator,  8*  ,  has  enhanced 
the  range  of  ability  for  which  a  given  test  is  meaningful  without 
sacrificing  the  objectivity  of  testing,  as  Bayesian  estimates  do. 
When  the  number  of  items  is  as  small  as  seven  and  all  items  are 
binary  items,  as  is  the  case  with  LIS-U  ,  the  computation  of 

6V-min  and  6V-max  is  relatively  easy,  owing  to  the  fact  that 
the  number  of  all  possible  response  patterns  is  a»  umall  as  128  . 


•80  -4.0  -3.0  -20  -to  QlO  1A  10  SO  40  80 

LATENT  TRAIT  • 

FIGURE  6-3-2 

Two  Io|r«Mlot>i  of  tho  Modified  Maxima  Likelihood  btUatt, 

8$  ,  on  Ability  9  ,  U»ing  (-1.5,  1.5)  (Duhwd  Com)  and 

(-2,15,  2,25)  (Solid  Curvo)  as  (9,  5)  ,  Together  with  the 
Kegreeelon  of  tha  Beyee  Modal  Batlaata  with  a(0,l)  aa  tha 
Trior  (Dotted  Cum)  . 

Mote,  however,  that  the  increase  in  the  number  of  items,  and/or  in 
the  number  of  item  scores  for  each  item,  will  soon  make  it 
practically  impossible  to  compute  these  two  substitutes  estimates, 
since  the  number  of  all  possible  response  patterns  will  increase  by 
gigantic  steps.  For  example,  if  a  test  has  ten  binary  items 
Instead  of  seven,  the  number  of  all  possible  response  patterns  will 
be  1,024  ;  if  a  teat  has  seven  three-item-score-category  Items, 
the  number  of  all  possible  response  patterns  will  be  2,137  ;  if  a 
tost  has  fifteen  three-item-score-^ategory  items,  it  will  be  as 
large  as  14,348,907  I 

It  is  necessary,  therefore,  that  we  invent  some  method  to 


-104- 


Vl-13 


deal  with  the  situation  In  which  the  number  of  all  the  possible 
response  patterns  is  too  large  for  us  to  compute  6*^^  end 
9V-oax  directly .  By  virtue  of  the  availability  of  electronic 
computers  and  the  Monte  Carlo  method,  this  can  be  done  by 
introducing  the  sample  statistic  versions  of  the  two  estimators. 


Let  N  be  the  number  of  examinees  who  were  selected 

randomly  from  the  uniform  distribution  for  the  interval  of  8  , 

(8,6)  .  Let  denote  the  number  of  examinees  who  belong  to  the 

above  sample  and  whose  levels  of  ability  are  lower  than  the 

critical  value  6  ,  and  N  be  of  that  of  those  whose  ability 

c  ts  , 

levels  are  higher  than,  or  equal  to,  6  ,  Thus  we  can  write 

c 


(6.9)  N  *  \  +  NH  * 


Let  and  denote  the  numbers  of  examinees  who  obtained 

the  response  pattern  V  ,  in  the  above  two  subgroups  of  the  sample, 
respectively.  Thus  we  have 


(6.10) 


f\  -  ‘  \v 


It  can  be  seen  that  the  sample  statistic  corresponding  to  (  c  P  (6)  d0 


in  the  formula  (6.8)  is  Nj^(0  ”  ®)  ,  and  also  the  one  for 

e  i” 

Py(0)  d8  is  ^jjy(6  -  6C)  .  Substituting  these  settle 


I 


statistics  into  (6.8)  and  rearranging,  we  obtain  0*  .  and  S* 

v-mxn  v— max 

such  that 


(6.11) 


[  ^V-oln  -  lj«c  +  2>\-  %  V  Wi 

V^V-aax 


^V— max  “  \ 

V-max  2  c  H  v^v-Bln  V  HV 


1  ev  NHV]  *HV 

vyVtmax 


-1 

-max  , 


-105- 


VI-14 


where  NT  „  and  Nmr  are  the  numbers  of  examinees  who 

LV— min  Hv-max 

belong  to  the  lower  subgroup  and  obtained  the  response  pattern 
V-min  ,  and  those  who  belong  to  the  upper  subgroup  and  obtained 
the  response  pattern  V-max  ,  respectively. 

It  can.  be  seen  that  6*  .  and  6*  ,  which  were 

V-min  V-max 

defined  in  the  preceding  paragraph ,  are  consistent,  or  converge  in 

probability  to  ey_m^n  an^  max  *  re8Pect^ve^y»  a®  the  sample 

sizes  increase.  In  other  words,  if  and 

NL_,  „  are  large  enough,  the  probabilities  with  which  6*  , 

nv— max  v— mm 

and  §*  assume  values  within  the  vicinities  of  0*  .  and 

V-max  V-min 

0V-max  *  respectively ,  will  be  very  high.  Although  the  two 

numbers,  NLV-min  NHV-max  *  a^8°  depend  upon  the  choice  of 

the  interval,  (6,0)  ,  by  virtue  of  the  Monte  Carlo  method,  we  can 

control  the  two  sample  sizes,  N  and  N  ,  as  we  wish. 

Li  n 

A  procedure  with  which  we  may  obtain  0*^^  and  ®y_fflax  » 
which  are  defined  by  (6.11),  can  be  summarized  as  follows. 


(1)  Determine  the  interval,  (6,6) 

(2)  Obtain  the  critical  value,  6  . 

c 

(3)  Determine  the  sample  size,  N  ,  which  makes  both  and 

large  enough  for  our  purpose. 

(4)  Produce  the  ability  levels  of  these  N  hypothetical 
subjects  from  the  uniform  distribution  for  the  interval, 

(6,0)  .  This  can  be  done  either  by  the  Monte  Carlo 
method,  or  by  placing  the  N  examinees  at  the  equally 
spaced  points  in  the  entire  interval,  (6,0)  ,  or  using 
one  of  its  variations. 

(5)  Calibrate  by  the  Monte  Carlo  method  a  response  pattern 
for  each  of  the  N  hypothetical  examinees  with  respect 
to  the  n  test  items  of  our  test. 

(6)  Find  out  the  two  frequencies,  N_„  and  N_r  , 

LV  HV 


for  each 


-106 


VI-15 


response  pattern  V  . 

(7)  Obtain  the  maximum  likelihood  estimate  for  each 

response  pattern  whose  frequencies ,  N  and  ,  are 

LV  HV 

not  both  zero,  excluding  V-trln  and  V-max  . 

(8)  Use  the  above  results  in  (6.11),  and  compute  6*  .  and 

.  v-min 

0* 

V-max 

Note  that  the  probabilities  with  which  we  obtain  positive  frequencies 
for  and  NHV-min  ate  oe8Hglbly  small,  and  this 

fact  can  be  used  as  a  checking  process. 

A 

Thus  we  can  define  the  new  estimator,  9*  ,  such  that 


fm  §* 

V-min 

for 

V  *  V-min 

(6.12) 

0*  , 

>  6* 

.  V-max 

for 

V  -  V-max 

.  § 

otherwise, 

as  distinct  from  6*  ,  which  is  defined  by  (6.4).  Unlike  0*  , 
this  estimator,  8*  ,  depends  upon  the  Monte  Carlo  method,  and, 
therefore,  it  has  some  fluctuations.  In  order  to  reach  high 
accuracies,  we  need  large  numbers  for  N^  and  . 

(VI. 4)  Bayes  Estimator  with  a  Uniform  Density  as  the  Prior 

Let  be  the  Bayes  estimator  with  the  prior,  fy(0)  . 

We  can  write 

(6.13)  V^v  -  j  9  fy(9)  d6  , 

where  fy(0)  is  the  density  function  of  6  for  the  subgroup  of 
examinees  whose  response  patterns  are  uniformly  V  ,  which  is  given 
by 


-107- 


VI' 


(6.14)  fv(e)  -  f (6)  Pv(6)  [  j  f(e)  Pv(e)  de]’1  . 

This  estimator  Is  the  one  which  makes  the  mean  square  error,  such 
that 

(6.15)  Q  -  E[(6**  -  6)2]  , 

* 

minimal  (Samejima,  1969),  where  6**  is  any  conceivable  estimator 
of  6  based  upon  the  response  pattern  V  .  It  is  obvious  that 
this  estimator  heavily  depends  upon  the  prior. 

We  can  think  of  a  population-free  estimator  based  on  the  Bayes 
estimator,  by  removing  the  influence  of  a  particular  prior.  Let 
us  assume  that  we  can  more  or  less  specify  the  interval,  (6,6)  , 
for  which  our  test  is  meaningful.  To  lift  the  effect  of  a  given 
prior,  we  shall  use  the  uniform  density  for  this  interval  of  6  . 

Let  be  the  resultant  estimator.  Thus  we  have 

{-  (e  -  e)'1  for  e  <  e  <  e 

-  0  otherwise. 


Substituting  (6.16)  into  (6.13)  and  rearranging,  we  obtain 


re  n 

(6.17)  -  j  ^  6  Py(6)  de  lj^  Pv(6)  de] 


Note  that  this  estimator  depends  solely  upon  the  operating 
characteristic,  Pv(9)  *  end  the  interval,  (6,6)  ,  for  which  our 
test  is  meaningful. 


In  practice,  it  may  not  be  wise  to  use  this  estimator,  since 
even  with  a  relatively  small  number  of  test  items  the  number  of 
response  patterns  is  so  large  and  the  calculation  of  the  estimates 
is  time-con  Burning.  We  could  use  two  estimates,  and 


-108- 


Vl-17 


V1V  max  *  ^or  t*ie  reP^aceiIient  negative  and  positive  infinities 
of  the  maximum  likelihood  estimate. for  the  two  extreme  response 
patterns,  V-min  and  V-max  ,  however,  without  going  through  too 
tedious  computations. 

(VI .  5)  Subtest 

We  notice  in  Figure  3-4-1  that  the  square  root  of  the  test 
information  function  for  Subtest  3  decreases  quickly  as  6  departs 
from  the  middle  part  of  the  Interval,  (-3.0,  3.0)  .  With  this 
subtest  as  the  Old  Test,  we  find  fourteen  hypothetical  examinees 
who  obtained  V-min  ,  and  twelve  who  obtained  V-max  ,  for  their 
response  patterns.  Since  this  is  as  large  as  5.2  percent  of  the 
total  number  of  examinees,  Instead  of  excluding  them,  we  decided  to 
keep  them  and  experiment  with  them  on  the  alternative  estimator,  G*  * 
which  was  introduced  in  the  Section  VI. 3  . 

With  S-ibtest  9  as  our  Old  Test,  we  find  one  examinee  who 
obtained  V-min  and  one  whose  response  pattern  is  V-max  .  In 
this  case,  we  excluded  these  two  from  our  original  data  and  used 
498  examinees  in  our  estimation  process,  since  there  are  only  two 
and  their  exclusion  will  not  change  the  result  substantially.  With 
all  the  other  subtests,  none  of  our  hypothetical  examinees  obtained 
V-min  or  Vrmax  . 

Table  6-5-1  presents  the  Identification  number  and  the 
ability  level  for  each  of  the  fourteen  hypothetical  examinees  who 
obtained  V-min  and  the  twelve  who  obtained  V-max  ,  for  Subtext  3  . 
We  can  see  in  this  table  that,  although  most  of  these  twenty-six 
examinees  have  the  ability  levels  equal  to  or  close  to  one  of  the 
two  extreme  values  of  0  ,  -2.475  and  2.475  ,  there  are  some 
examinees,  like  118  ,  210  and  491  ,  whose  ability  levels  are 

substantially  less  than  2.475  in  absolute  values.  Tables  6-5-2 
and  6-5-3  present  the  response  patterns  of  these  twenty-six 
examinees  for  the  ten  unknown,  binary  test  items. 

We  need  some  modification  to  the  estimator,  however.  Since 


-109- 


VI-18 


TABLE  6-5-1 


Ifeatlflcatloe  Mu  mi  Aklllty  Lsvol  of  beta  of  the 
Fourteen  lypottaetlcel  twin—  Who  Otatelned  V-nls  , 
end  of  the  frelve  Mho  Otatelned  V-eu  of  btateet  3  . 


the  square  root  of  the  teat  information  function  of  Subtest  3  is  not 
constant,  we  must  transform  6  to  t  in  the  process  of  estimating 
the  operating  characteristics  of  the  item  scores.  We  recall  that, 
with  the  transformed  scale  of  ability,  the  asymptotic  unbiasedness 
and  the  normality  were  used  as  the  approximation  to  the  conditional 
distribution  of  the  maximum  likelihood  estimate,  ,  given  t  .  We 
need,  therefore,  the  unbiasedness  of  the  modified  eativ”  »r  with 
respect  to  t  ,  instead  of  6  .  L»t  t*  be  the  estimator  with 
respect  to  t  .  Thus  we  can  write 


(6.18) 


f  •  x* 

■  V-min 

.  .?* 
TV-max 


for  V  »  V-min 
for  V  -  V-taax 
otherwise 


where 


T* 

V-min 


and 


t* 

V-max 


are  defined  by 


-110- 


VI-19 


TABLE  6-5-2 

Identification  Vuabar  and  th«  Response  Pattern 
of  the  Tan  Unknown,  Binary  Itaaa  Obtained  by 
Each  of  the  Fourteen  Hypothetical  Examinees 
Whose  Response  Patterns  of  Subtest  3  are 
V-nin  . 


It 

Response 

Pattern 

1 

0001000000 

101 

0100000000 

201 

0100000000 

401 

1000000000 

2 

0100000000 

102 

0000000000 

202 

OQOOOOOOOO 

302 

1000000000 

303 

1000000000 

4 

noooooooo 

108 

1000000000 

109 

1001000000 

210 

1000000000 

118 

1010000000 

(6.19) 


f  ,  “  [x(t  +  T)  N_  -  X  T„  Nt„]  N...  T1 

[  V-mln  l2  c  -  L  V  LVJ  XV-min 

V^V-raax 


U- 


-  [i(T  +  T  )  H„  -  Z  N  ]  N  -1  , 

•max  l2-  c  H  v+v_m±n  V  HVJ  HV-max 

V^V-max 


In  these  formulas ,  t  ,  t  ,  t  ,  and  the  maximum  likelihood 

c  - 

estimate,  tv  ,  can  uniforrvly  be  transformed  from  6c  ,  6  ,  6 

and  §v  ,  by  means  of  t  -  x(9)  . 

Figure  6-5-1  presents  the  two  operating  characteristics, 

PA  _4«(t)  and  PA  _ (O  ,  as  functions  of  t  ,  by  s  Lid  and 

v— mm  v— max 

dotted  curves,  respectively.  In  the  same  figure,  also  presented 
are  the  positions  of  two  ' s  which  we  used  separately.  Eight 
different  intervals  were  used  for  (t,t)  ,  and  the  results  are 


-111- 


VI-20 


TABLE  6-5-3 


Identification  Ruaber  »nd  the  Response 
of  the  Ten  Unknown,  Binary  Items  Obtained  by 
Each  of  the  Twelve  Hypothetical  Examinee* 
uhne*  Response  Pattern*  of  Subtest  3  are 


ID 

Response 

Pattern 

491 

1111111000 

193 

1111111110 

493 

1111111110 

294 

1111111111 

296 

1111111111 

397 

1111111111 

98 

llllllllll 

198 

1111101110 

199 

llllllllll 

299 

1111111110 

499 

llllllllll 

300 

llllllllll 

LATENT  TIUrT  X 

FIGURE  6-5-1 


Operating  Characteriatice  of  V-wto  (Solii 
V-«ax  (Dotted  tine)  of  Subteat  3  Civeo  A*  function* 
of  the  Transform*!  Latent  Trait  t  ,  Together  with  the 
Critical  Value  ,  rc  ,  Set  at  Two  Different  Positions. 


-112- 


VI-21 


shovm  in  Tables  6-5-4  and  6-5-5  for  t  ■  0.1203  and  x  -  -0.5455  , 

c  c 

respectively.  It  is  obvious  that  for  the  first  three  intervals  of 


1  I 


TABLE  6-5-4 


two  Estimates  .  ad  t« _ ,  Obtained  by  Doing  Inch  of  the  Eight 

Different  Intervals,  (t.t)  ,  and  Te  -  0.1203  for  Subtest  3  .  The  Sanple 
Si**e,  *4.  •  “d  H  ,  Together  with  the  Two  Frequencies  and 

i«  -T_  i  Are  Also  Presented  for  Each  Case. 


rV-ninK- 


-1.8456  2.0771  2.9707  -0.6316 
-2.0521  2.2668  5.8168  0.6564 
-2.2461  2.4373  -1.5891  1.7371 
-2.4273  2.5860  -1.8162  2.2439 
-2.5131  2.6516  -2.2006  2.4000 
-2.6757  2.7636  -2.5467  2.6242 
-2.8267  2.8095  -2.7265  2.7370 
-3.0000  3.0000  -2.8432  2.8855 


"a 

N 

1,640 

1,630 

3,270 

1,810 

1,790 

3,600 

1,970 

1,930 

3,900 

2,125 

2,055 

4,180 

2,195 

2,110 

4,305 

2,330 

2,205 

4,535 

2,455 

2,240 

4,695 

2,600 

2,400 

5,000 

Two  Eatlmatea, 


T* 

V-«i a 


TABLE  6-5-5 


,  Obtained  by  Da log  Each  of  the  Eight 


Different  Intervale,  (r,t)  .and  •  -0.5455  for  Subteet  3  .  The  Sanple 
Sices,  and  K  ,  Together  with  the  Two  Frequencies  and 

_  ,  ire  Also  Presented  for  Each  Case. 


*$-einl 


-1.6456  2.0771 
-2.0521  2.2668 
-2.2461  2.4373 
-2.4273  2.5860 
-2.5131  2.6516 
-2.67  5  7  2  .  7636 
-2.8267  2.8095 
-3.0000  3.0000 


7.7998 

11.3745 

-0.8183 

-1.6061 

-2.0651 

-2.4788 

-2.6867 

-2.8214 


-2.2507 

0.1132 

1.4841 

2.0856 

2.2750 

2.5455 

2.6865 

2.8596 


.2,185  3,270 
2,345  3,600 
2,465  3,900 
2,610  4,180 
2,665  4,305 
2,760  4,535 
2,795  4,695 
2,955  5.000 


-113- 


VI- 


x  the  results  are  meaningless,  since  the  two  frequencies, 

and  Ntr _ ,  are  so  small.  We  can  also  see  that,  as  these 

v— max 

frequencies  grow  larger,  the  resultant  estimates  get  closer  to  each 

other  over  the  two  different  values  of  x  . 

c 

To  compare  these  results  with  the  largest  and  the  smallest 

finite  maximum  likelihood  estimates  for  Subtest  3,  the  values  of 

tv  for  the  fifteen  response  patterns  in  which  only  one  item  is 

answered  correctly,  and  those  for  the  fifteen  other  response 

patterns  in  which  only  one  item  is  answered  incorrectly,  are  shown 

in  Table  6-5-6.  We  can  see  in  this  table  that  the  least  finite  x^ 

is  -2.6518  and  the  greatest  finite  x^  is  2.7683  .  We  notice 

in  Tables  6-5-4  and  6-5-5  that  only  the  largest  interval  of  x  , 

(-3.0,  3.0)  ,  provides  us  with  two  alternative  estimates,  which 

are  greater  in  absolute  values  than  those  two  finite  estimates. 

Our  selection  is,  therefore,  -2.843  for  x*  .  ,  and  2.885  for 

v— mm 

TABLE  6-5-6 

Fifteen  geaponee  Fatteme  of  Subtaat  3,  Each  of  Which  Con* lata  of  Fourteen 
Zcroa  and  Ona  ."1"  ,  and  tha  Corresponding  Two  Hanlaus  ,i>-  '  ihood  Estimates, 

6y  and  ry  ,  for  Each  Response  Pattern,  and  AnofW'  of  Fifteen 

Kaaponaa  Patterns,  Each  of  Which  Has  (n-1)  a  a  ~nd  Ona  (a  -1) 

„  „  g  g 

and  tha  Corraaponding  6  and  t  for  each. 


Response  Pattern 

*V 

A 

Tv 

Raapouaa  Pattern 

5V 

*V 

000000000000001 

-1.3998 

-1.7296 

222222222222221 

2.3526 

2.6855 

000000000000010 

-1.5206 

-1.8562 

222222222222212 

2.3454 

2.6800 

000000000000100 

-1.9182 

-2.2347 

222222222222122 

2.4651 

2.7683 

000000000001000 

-1.6990 

-2.0336 

222222222221222 

2.2762 

2.6258 

000000000010000 

-1.9465 

-2.2592 

222222222212222 

2.3359 

2.6727 

000000000100000 

-1.8783 

-2.1995 

222222222122222 

2.1981 

2.5620 

000000001000000 

-1.8346 

-2.1603 

222222221222222 

2.0525 

2.4359 

000000010000000 

-2.0033 

-2.3075 

222222212222222 

2.0810 

2.4613 

ooooooioooooooo 

-2.0205 

-2.3218 

222222122222222 

1.9725 

2.3627 

000001000000000 

-2.1792 

-2.4483 

222221222222222 

2.0237 

2.4098 

000010000000000 

-2.0811 

1 

-2.3714 

222212222222222 

1.7479 

2.1437 

oooiooooooooooo 

-2.3846 

-2.5959 

222122222222222 

2.0530 

2.4363 

001000000000000 

-2.3887 

-2.5987 

221222222222222 

1.9407 

2.3329 

010000000000000 

-2.3585 

-2.57B2 

212222222222222 

1.7595 

2.1555 

100000000000000 

-2.4698 

-2 .6518 

122222222222222 

1.8532 

2.2488 

-114- 


Vl-23 


x*  max  *  resPectively-  The  sample  regression  of  x*  on  x  for 
our  five  hundred  observations  turned  out  to  be  0.998x  +  0.001 
(cf.  RR-81-2),  which  is  very  close  to  the  unbiasedness. 


The  other  two  alternative  estimates,  y$'  .  and  y*’ 

xv— min  lv— max  , 

which  were  introduced  in  the  preceding  section,  were  also  computed 
for  each  of  the  eight  Intervals.  These  values  were  calculated  with 
respect  to  x  ,  instead  of  8  ,  and  we  obtained  »  -1.7434, 

-1.9286,  -2.0965,  -2.2464,  -2.3143,  -2.4364,  -2.5402,  -2.7527  and 
y*  -  1.9980,  2.1810,  2.3457,  2.4905,  2.5551,  2.6684,  2.7171,  2.7805  , 

xv— max 

for  Cases  1  through  8,  respectively.  As  the  interval  of  x  grows  larger, 
the  resultant  estimates  get  closer  to  the  corresponding  values  of  ^Lm^n 
and  ^y_uiax  •  We  did  not  use  them  as  the  substitutes  for  negative 
and  positive  infinities  of  the  maximum  likelihood  estimate,  however, 
since  the  conditional  unbiasedness  of  our  estimate  is  an  important 
characteristic  in  our  rationale  behind  the  methods  and  approaches 
for  estimating  the  operating  characteristics  of  unknown  test  items. 


(VI. 6)  Nine  Subtests  As  Our  Old  Test 

In  the  first  year  of  the  present  research,  the  original  Old 
Test  was  solely  used  as  our  Old  Test  in  estimating  the  item 
characteristic  functions  of  the  ten  unknown,  binary  test  items. 
Thus  the  first  seven  research  reports,  RR-77-1,  RR-78-1  through 
RR-78-6,  out  of  the  total  eleven,  which  are  written  on  the 
estimation  of  the  operating  characteristics,  are  based  upon  the 
original  Old  Test,  while  the  other  four  research  reports,  RR-80-2, 
RR-80-4,  RR-81-2  and  RR-81-3,  are  based  upon  the  nine  subtests  of 
the  original  Old  Test  (cf.  Chapter  2).  The  original  Old  Test 
consists  of  thirty-five  test  items  of  three  score  categories  each, 
whose  item  parameters  are  given  in  Table  3-4-1  of  Section  III. 4  , 
with  each  item  following  the  normal  ogive  model.  Furthermore,  it 
has  an  approximately  constant  square  root  of  the  test  information 
function,  4.65  ,  for  the  interval  of  ability  of  our  interest. 

This  is  an  ideal  situation,  and  it  also  provides  us  with  simpler 
methods  and  approaches,  in  which  no  transformation  of  ability  6 
is  needed.  This  situation  can  be  materialized  easily  in  adaptive 


-115- 


VI-24 


testing,  which  we  shall  observe  in  Chapter  7. 

On  the  other  hand,  it  will  be  meaningful  to  test  the 
robustners  of  our  methods  and  approaches  of  estimating  the  operating 
characteristics  by  using  a  less  than  ideal  Old  Test,  i.e.,  one  which 
has  fewer  test  items  and  a  non-constant  square  root  of  the  test 
information  function.  This  experiment,  if  the  result  turns  out  to 
be  supportive,  will  have  a  benefit  of  expanding  the  applicability 
of  our  methods  and  approaches,  since  in  the  paper-and-pencil  testing 
situation  most  tests  do  not  provide  us  with  constant  amounts  of  test 
information. 

The  selection  of  the  test  items  for  each  of  the  nine  subtests 
of  our  original  Old  Test  is  shown  in  Table  3-4-1,  and  the  square 
root  of  the  test  information  function  is  given  in  Figure  3-4-1,  of 
Section  111.4  .  We  notice  that  Subtest  3  is  also  a  subtest  of 
Subtest  1,  and  Subtest  4  is  a  sub test  of  Subtest  2,  and  all  the 
other  five  subtests  are  those  of  the  original  Old  Test  only. 

In  this  experimentation,  Simple  Sum  Procedure  of  the 
Conditional  P.D.F.  Approach  (cf.  Section  V.13)  with  the  Normal 
Approach  Method  (cf.  Section  V.9)  was  selected  as  our  combination 
of  a  method  and  an  approach.  The  main  reason  for  this  selection  of 
the  Simple  Sum  Procedure  is  its  simplicity,  which  does  not  require 
the  approximation  to  the  density  function  of  t  with  respect  to 
each  item  score  category  of  each  unknown  test  item,  as  Bivariate 
P.D.F.  Approach  does,  nor  the  weight  and  the  proportion  which 
Weighted  Sum  Procedure  and  Proportioned  Sum  Procedure  need, 
respectively.  The  main  reasons  why  we  selected  Normal  Approach 
Method  are,  again,  its  simplicity,  which  requires  only  the  first  two 
conditional  moments  of  t  ,  given  t  ,  and  the  'act  that  the 
criterion  item  characteristic  function  had  been  obtained  in  the 
Simple  Sum  Procedure  for  each  of  the  unknown  test  items,  and  the 
results  obtained  by  the  Normal  Approach  Method,  as  well  as  those 
obtained  by  the  Pearson  System  Method  and  the  Two-Parameter  Beta 
Method,  respectively,  turned  out  to  be  practically  identical  with 


-116- 


VI-2S 


the  criterion  item  characteristic  function  (cf.  Section  V.13). 

With  each  of  Subtest  1,  2  and  3  as  our  Old  Test*  both  Degree  3  and 
4  Cases  were  applied.  For  the  other  six  subtests,  however,  only 
Degree  4  Case  was  adopted.  The  reason  for  the  exclusion  of  Degree 
3  Case  in  this  later  research  is  that,  in  all  the  previous  studies, 
the  resultant  estimated  item  characteristic  functions  obtained  in 
Degree  3  Case  turned  out  to  be  practically  identical  with  those 
obtained  in  Degree  4  Case. 

As  we  have  seen  in  the  preceding  section,  with  Subtest  3  as 
our  Old  Test,  we  used  the  set  of  modified  maximum  likelihood 
estimates,  t*  (s-1,2, * , . ,N)  ,  as  our  basic  data.  With  each  of  the 

S 

other  eight  subtests  as  our  Old  Test,  the  set  of  maximum  likelihood 
estimates,  ts  ,  was  used. 

This  part  of  the  research  is  partly  credited  to  the  conscientious 
effort  by  one  of  the  author’s  assistants,  Paul  Changas. 

(VI. 7)  Sample  Linear  Regression  of  t  on  t 

"  "  6  6 

Figure  6-7-1  presents  the  scatter  diagram  of  ability  6 

s 


LATENT  TRAIT  9 

FIGURE  6-7-1 

Scat tar  Dlagraa  of  6^  riot tad  Agalnat  for 

Oar  rivt  Hundred  Hypothetical  bualnaaa,  Which 
la  taaed  upon  tha  Original  Old  Taat. 


—117— 


VI-26 


(s-1,2,  ...,N)  and  its  maximum  likelihood  estimate,  §  ,  for  our 

8 

five  hundred  hypothetical  examinees,  which  are  obtained  upon  our 
Old  Test.  We  can  see  in  this  figure  that  the  conditional 
unbiasedness  of  6  ,  given  6  ,  may  approximately  be  satisfied. 

The  sample  linear  regression  of  @  on  6  ,  or  the  best  fitted 
linear  function  of  6  in  the  least  squares  sense,  turned  out  to  be 
1.0040  -  0.006  ,  which  is  very  close  to  the  unbiased  line,  or  the 
linear  function  which  passes  the  origin  with  the  slope  of  u*.ity. 

Figure  6-7-2  presents  the  nine  scatter  diagrams  of  the 
transformed  latent  trait,  x  (s*1,2, . . . ,N)  ,  and  its  maximum 

^  S 

likelihood  estimate,  t  ,  for  our  five  hundred  hypothetical 

examinees,  with  the  exception  of  the  one  for  Subtest  9,  in  which 

four  hundred  and  ninety-eight  examinees  are  used  (cf .  Section  VI. 5)  . 

In  this  figure,  for  Subtest  3,  the  modified  maximum  likelihood 

estimate,  x*  ,  is  used  Instep  of  the  maximum  likelihood  estimate, 

x  .  For  convenience,  we  shall  not  repeat  this  in  the  rest  of  this 
8 

section  and  in  Section  VI. 8  ,  but  the  reader  must  understand  this  is 
the  case.  The  sample  linear  regressions  of  x  on  x  for  the 

S  8 

seven  of  the  total  nine  scatter  diagrams  are  as  follows. 


Subtest 

3: 

1.012x 

-  0.004 

Subtest 

4: 

1.003x 

+  0.004 

Subtest 

5: 

1.018x 

-  0.007 

Subtest 

6: 

l.Ollx 

-  0.000 

Subtest 

7: 

1.016x 

-  0.003 

Subtest 

8: 

l.OOOx 

-  0.009 

Subtest 

9: 

1.009x 

+  0.013 

We  can  see  that,  in  all  these  cases,  the  sample  linear  regressions 
are  very  close  to  the  unbiasedness  line,  and  practically 
indistinguishable  from  it. 

Examination  of  Figure  6-7-2  reveals,  however,  that  the 
conditional  normality  of  the  distribution  of  x  ,  given  x  ,  may  not 
be  approximately  satisfied  for  some  subtests.  It  is  obvious  that, 
as  the  number  of  test  items  in  the  Old  Test  decreases,  the  conditional 


-120 


VI-29 


item  characteristic,  functions  of  the  unknown  binary  test  Items. 

(VI. 8)  Polynomial  Approximation  to  the  Density  Function.  g(x) 

Figure  6-8-1  presents  the  two  polynomials  of  degrees  3  and  4  , 
which  were  obtained  by  the  method  of  moments  (cf.  Section  V.6)  to 
approximate  the  density  function,  g(x)  ,  together  with  the 
frequency  distribution  of  the  five  hundred  x#  'a  ,  for  each  of 
Subtests  1,  2  and  3  .  In  each  of  these  three  graphs,  the  resultant 
polynomial  of  degree  3  Is  plotted  by  a  dotted  curve,  and  that  of 
degree  4  Is  drawn  by  a  solid  curve.  Approximation  to  the  density 
function,  g(x)  ,  by  a  polynomial  was  conducted  only  for  Degree  4 
Case  for  each  of  the  other  six  subtests,  the  result  of  which  Is 
shown  as  Figure  6-8-2.  We  can  see  in  these  two  figures  that  there 
are  varieties  of  different  curves  and  histograms.  They  are  similar 
for  Subtests  2  and  4,  but  they  are  not  too  close  for  Subtests  1  and 
3  ,  for  the  latter  of  which  the  modified  maximum  likelihood 
estimate,  x*  ,  was  used  instead  of  t  .  The  histogram  shows 
greater  degrees  of  ups  and  downs  as  the  number  of  test  Items 
decreases,  the  result  which  was  predictable  from  our  observations 
of  the  scatter  diagrams  in  the  preceding  section. 

For  comparison  the  reader  is  suggested  to  go  back  to  Figure 
4-1-2  of  Section  IV. 1  ,  In  which  similar  graphs  are  shown  for  the 
approximation  by  the  polynomials  of  degrees  3,  4  and  5,  for  the  five 
hundred  6s  which  were  obtained  upon  the  original  Old  Test. 

(VI. 9)  Estimated  Item  Characteristic  Functions  Obtained  upon 
Subtests  1,  2  and  3 

As  before,  for  the  purpose  of  illustration,  we  shall  take 
Item  6  as  an  example.  Figure  6-9-1  presents  the  criterion  item 
characteristics  functions  (cf.  Section  V-13)  obtained  upon  Subtests 
1,  2  and  3,  which  are  plotted  by  dotted  and  short,  dashed  curves, 
and  dashes  and  dots,  respectively,  In  comparison  with  the  one 
obtained  upon  the  original  Old  Test  and  the  theoretical  item 
characteristic  function,  which  arc  shown  by  long,  dashed  and  solid 


hktut  1 


i 


i 


FIGURE  6-8-1 


Katlaotod  Daaalty  fraction  ,  |«)  ,  Ofctainod  Vy  tfea 
Mat  hod  of  Hoaaota  u  a  folyoooial  of  Da(raa  3  (Dot  tod 
Curva)  and  4  (Solid  Curva) ,  To|atbar  with  tha  Solatia* 
Frequency  Distribution  of  tha  Uvi  Suadrod  >  for 

Bach  of  Subtaata  1,  2  and  3  .  for  Sabtaat  3,  f*  is 
Bead  Xaataad  of  tf  . 


-123- 


VI-32 


FIGURE  6-9-1 

Four  Criterion  Ittm  Charoctoriotlc  Fuuctieu  OS t» loud  «fot  tte  OrlglMl 
Old  Tut  (Uti.luWCun),  upon  Subtut  1  (Dottad  Catva) ,  »*»  lubtut 
2  (Short,  Duhad  Curva)  and  upon  Subtaat  3  (Suhaa  and  Dota) ,  Tofathnr 
with  tha  Thao rat leal  Itaa  Charactarlatlc  Function. 

curves.  We  can  see  that  the  two  criterion  Item  characteristic 
functions,  which  were  obtained  upon  Subtest  3  and  upon  the  original 
Old  Test,  are  practically  Indistinguishable,  and  the  one  for 
Subtest  3  la  also  very  doss  to  then.  This  is  a  common  tendency 
among  all  the  ten  binary  test  items.  In  contrast  to  them,  for  the 
Interval  of  0  ,  (-1.3,  1.6)  ,  the  one  obtained  upon  Subtest  2  is 
substantially  different  from  the  other  three,  and  the  fitness  to 
the  theoretical  item  characteristic  function  is  a  little  poorer. 
This  is  not  the  esse  with  all  the  other  nine  binary  test  items, 
however.  In  fact,  although  for  items  3,  5,  6  and  7  the  fitness  is 
poorer  for  the  ones  obtained  upon  Subtest  2,  the  order  is  reversed 
for  items  1,  2,  A,  8  end  10  .  It  is  interesting  to  note  that  for 
Items  with  Intermediate  difficulty  like  items  S,  6  and  7  the 
criterion  item  characteristic  functions  fit  rather  poorly  to  the 
corresponding  thaoratlcal  item  characteristic  functions.  This 
result  is  more  or  less  expected  from  the  smell  amount  of  test 


information  of  Subtest  2  in  the  vicinity  of  6  •  0.0  . 


-CL25- 


VI-34 


Bstlaated  Itaa  Characteristic  function  of  Its*  6  Uw<i  upon  Subtut  2  (Dotted  : 

Curve)  and  the  One  Bated  upon  tba  Original  Old  Taat  (Dashed  Curve)  Obtained  by  J 

the  S lapis  Sun  Procedure  of  the  Conditional  T.V.T.  Approach  with  tba  Voraal  > 

Approach  Method ,  in  Degree  3  end  1  Cases,  la  Comparison  with  the  Theoretical 
Iten  Characteristic  function  (Solid  Curve)  ud  the  frequency  ftatloe  of  Those  1 

Vho  Answered  Correctly  (Jagged  Selld  Line) .  j 


together  with  the  corresponding  results  obtained  upon  the  original 
Old  Test,  which  are  plotted  by  dashed  curves.  In  the  same  figures, 
also  presented  are  the  theoretical  item  characteristic  function  of 
item  6,  and  the  frequency  ratios  of  those  who  answered  correctly. 


-126- 


Vl-35 


by  solid  curves  and  jagged  solid  lines,  respectively.  It  is 
striking  to  note  that  these  results  in  Degree  3  and  4  Cases  are 
practically  identical,  for  the  interval  of  0  ,  (-2.2,  2.2)  ,  for 
both  Subtests  1  and  2  .  We  also  notice  that  they  are  very  close 
to  the  corresponding  criterion  item  characteristic  functions,  which 
we  have  observed  in  Figure  6-9-1.  These  findings  are  not  new,  but 
have  been  observed  repeatedly  before,  in  the  results  obtained  upon 
the  original  Old  Test.  The  results  for  Subtest  1  are  practically 
identical  with  those  obtained  upon  the  original  Old  Test,  for  the 
interval  of  6  ,  (-2.2,  2.2).  These  facts  are  true  not  only  for 
item  6,  but  also  for  each  and  every  one  of  the  ten  binary  test 
items . 

Figure  6-9-4  presents  the  corresponding  results  for  Subtest 

2  when  the  square  root  of  the  test  information  function  in 
approximated  by  three  different  polynomials  using  three 
subintervals,  which  is  shown  in  Figure  4-6-3  of  Section  IV. 6  .  We 
can  see  that  the  resultant  estimated  item  characteristic  functions 
are  very  similar  to  those  presented  in  Figure  4-6-2,  in  both  Degree 

3  and  4  Cases.  This  turned  out  to  be  true  with  all  the  other  nine 
binary  test  items:  the  result  which  indicates  that  the  crude 
approximation  to  the  square  root  of  the  test  information  by  the 
single  polynomial  of  degree  7,  which  is  shown  in  Figure  4-6-2, 
serves  just  as  well  as  the  more  precise  one  obtained  by  the  three 
different  polynomials. 

Figure  6-9-5  presents  the  corresponding  results  for  Subtest 

3.  We  can  see  in  this  figure  that  the  resultant  estimated  item 

characteristic  function  obtained  upon  Subtest  3  Is  very  close  to 

the  one  obtained  upon  the  original  Old  Test,  in  both  Degree  3  and  4 

Cases.  This  is  a  common  tendency  among  all  the  ten  binary  test 

items.  The  use  of  the  modified  maximum  likelihood  estimate,  t*  , 

s  ’ 

certainly  did  not  affect  negatively  the  resultant  estimated  item 
characteristic  functions. 


L2 


8  ad 


5.0  -40  -3.0 1 -2JQ  -1.0  00  1.0  2.0 1  3.0  4.0  5.0 

LATENT  TRAIT  • 


|  as 
g  as 


l 


-5.0  -40  -3-0 1  -2-0  -1.0  0.0  1.0  2.0  T  3.0  4.0  5.0 

LATENT  TRAIT  • 


FIGURE  6-9-4. 

htlsitid  ltd  Characteristic  function  of  ltd  $  ImN  upon  Snbtest  2  (Dotted 
Carve)  end  the  One  luted  upon  the  Original  Old  Test  (Dashed  Cores)  Obtained  by 
the  Simple  Sue  Froc*  lure  of  the  Conditional  F.B.y.  Approach  with  the  boreal 
Approach  Method,  la  Degree  3  sad  4  Cease,  la  Codparleoo  with  the  Theoretical 
Its*  Characteristic  function  (Solid  Curve)  and  the  frequency  latloe  of  Those 
Mho  Answered  Cc-  -rctly  (Jagged  Solid  Line).  The  Sat  of  Three  Different 
Polynomials  fv  he  Three  Subintervala  Was  Deed  to  Che  Transformation 

of  a  to  r  . 


FIGURE  6-9-5 


Cetlaated  Item  Characteristic  Tunc t ton  of  Item  6  ki«d  upon  duhteet  3  (Dotted 
Curve)  and  tbs  One  Waed  upon  the  Orly  Inal  Old  Tact  (Dashed  Curve)  Obtained  bp 
the  Simple  Sn  Procedure  of  the  Conditional  P.B.7/.  Approach  with  the  normal 
Approach  liathod,  in  Degree  3  and  A  Caeca,  la  Coaparleon  with  tho  Theoretical 
Itea  Characterlatle  ♦'em; t too  (Solid  Curve)  and  the  Frequency  Satioe  of  Thoae 
Who  Anewered  Correctly  ( Jayyod  Solid  Line) . 


-129- 


VI-38 


(VI. 10)  Estimated  Item  Characteristic  Functions  Obtained  upon  the 
Six  Other  Subtests 

Figure  6-10-1  presents  the  resultant  estimated  Item 
characteristic  functions  of  item  6  In  Degree  4  Case,  which  were  . 
obtained  upon  Subtests  4,  5,  6,  7,  8  and  9,  respectively,  by  dotted 
curves,  in  comparison  with  the  one  obtained  upon  the  original  Old 
Test,  the  theoretical  item  characteristic  function,  and  the 
frequency  ratios  oi'  the  correct  answer,  which  are  plotted  by  dashed 
and  solid  curves,  and  jagged  solid  lines,  respectively.  We  can  see 
in  this  figure  that,  up  to  Subtest  6,  the  fitness  of  the  resultant 
estimated  item  characteristic  function  to  the  theoretical  Item 
characteristic  function  is  reasonably  good,  but,  after  that,  it 
grows  flatter.  This  is  a  common  tendency  among  all  tha  ten  binary 
test  items. 

Figure  6-10-2  presents  the  corresponding  results  fcr  the 
other  nine  binary  test  items,  which  were  obtained  upon  Sub test  6. 

We  can  see  in  this  figure  that  the  fitness  of  the  estimated  item 
characteristic  function  is  really  good  for  each  of  these  items.  In 
fact,  for  items  1,2  and  4  the  results  fit  the  corresponding 
theoretical  item  characteristic  function  better  than  those 
obtained  upon  the  original  Old  Tast,  and  they  are  just  as  good  for 
items  6,  8  and  10.  Considering  that  Subtest  6  only  contains  eleven 
test  items,  compared  with  thirty-five  in  the  original  Old  Test, 
this  result  is  outstanding.  We  must  conclude,  therefore,  our 
combination  of  a  method  and  an  approach  Is  robust  over  the  decrease 
in  number  of  test  items  in  our  Old  Test. 

It  is  desirable  to  experiment  on  the  other  combinations  of  a 
method  and  an  approach  for  estimating  the  operating  characteristics, 
than  Simple  Sum  Procedure  of  the  Conditional  P.D.F.  Approach  with 
the  Normal  Approach  Method,  which  we  used  in  the  present  study. 

This  must  wait  for  future  research,  however. 


VI-39 


•5.0  -40  -30  -20  -10  OO  10  20  30  40  50 

IATCMT  TRAIT  • 


i  04 


•50  -40  -50  -2j0  -10  00  10  20  30  40  50 


LATENT  TRAIT  • 


FIGURE  6-10-1 

Katlaatad  I  to*  Characteristic  function  of  It**  i  Based  upon  loch  of  Kubtaat*  4 
through  9  (Dotted  Cor**),  Obtained  by  th*  tlapla  fw  Procedure  of  tha  Conditional 
P.D.f.  Approach  and  th*  Momal  Approach  Method,  for  Degr**  4  Cam,  In  Comparison 
with  th*  On*  Bas*d  upon  th*  Original  Old  T**t  (Dashed  Curve),  th*  Theoretical  Xt«* 
Characteristic  function  (Smooth  Solid  Curve)  and  th*  frequency  Ratios  of  Those  Who 
Answered  Correctly  (Jagged  Solid  Line) . 


•  -.'j.  JL . 


fr«*  CHARACTERISTIC  FUNCTION 


-5.0  -40  -3j0  -2j0  -IjO  OjO  10  SJO  3j0  4J0  6j0 

LATENT  TRAIT  « 


LATENT  TRAIT  • 


FIGURE  6-10-1  (Continued) 


-136- 


Vl-45 


REFERENCES 


[1]  Indow,  T.  &  Saoejima,  F.  LIS  measurement  scale  for  non¬ 

verbal  reasoning  ability .  Tokyo:  Nippon  Bunka 
Kagakuaha,  1962.  (In  Japanese) 

[2]  Indow,  T.  &  Samejima,  F.  On  the  results  obtained  by  absolute 

scaling  node!  and  the  Lord  model  In  the  field  of 
Intelligence .  Yokohama:  Psychological  Laboratory, 
Hiyoshl  Campus,  Kelo  University,  1966. 

[3]  Samejima,  F.  Estimation  of  latent  ability  using  a  response 

pattern  of  graded  scores.  Psychometrika  Monograph, 

No.  17,  1969. 


-137- 


V1I-1 


VII  Adaptive  Testing 

In  this  chapter,  we  shall  observe  adaptive  testing,  or 
tailored  testing,  In  the  context  of  latent  trait  theory.  By  adaptive 
testing,  we  mean  the  testing  situation  In  which  test  items  are 
selected  for  an  individual  examinee  in  accordance  with  the  unknown 
ability  level  of  the  examinee,  from  the  prearranged  item  pool,  which 
consists  of  a  large  number  of  test  items  measuring  the  same  ability, 
or  abilities.  Thus  the  search  for  the  examinee's  ability  level  and 
the  search  for  suitable  test  items  for  him  are  conducted  together, 
aiming  at  estimating  the  examinee's  ability  as  accurately  aa  we  wish, 
without  spending  too  much  time  and  giving  the  examinees  too  many  test 
items.  The  efficiency  in  estimating  the  examinee's  ability, 
therefore,  is  the  essential  part  of  the  adaptive  testing.  We  can 
perform  adaptive  testing  in  the  form  of  paper-and-pencil  testing,  but 
the  most  effective  way  may  be  the  use  of  computers  with  screen 
terminals.  Latent  trait  theory  provides  us  with  a  strong  rationale 
for  adaptive  testing,  which  cannot  possibly  be  done  by  classical 
test  theory. 

(VII . 1)  Addition  of  Sew  Test  Items  to  the  Item  Pool 

As  was  pointed  out  in  Section  III. 4,  the  approaches  and  methods 
which  wore  observed  in  Chapters  3,  5  and  6,  for  estimating  the 
operating  characteristics  of  the  discrete  item  responses  are  most 
useful  in  developing  the  item  pool.  When  we  start  from  scratch,  the 
first  step  we  must  take  is  to  develop  a  certain  number  of  test  items 
which  measure  the  ability  of  our  interest,  to  confirm  their 
dimensionality,  and,  selecting  a  suitable  model,  or  models,  to  find 
out  the  operating  characteristics  of  these  test  items.  In  so  doing, 
we  need  a  certain  norm  group  of  examinees  to  administer  these  core 
test  items  to  obtain  the  basic  data,  and  also  this  process  includes 
the  elimination  of  unfit  test  items,  or  their  modifications.  After 
this  has  been  completed,  if  we  wish  to  add  more  test  items  to  our  item 
pool,  we  may  develop  more  test  items  and  estimate  their  operating 
characteristics  using  one  of  our  combinations  of  an  approach  and  a 


-138- 


vn-2 


method.  This  latter  process  can  also  be  used  in  the  situation  where 
an  item  pool  is  already  there  and  has  been  used  for  a  long  tine. 

An  advantage  of  this  situation  in  adaptive  testing  is  that  we 
do  not  need  the  transformation  of  ability  8  to  t  ,  which  was 
described  in  Sections  111.8  and  V.3,  provided  that  we  design  our 
procedure  suitably.  This  will  be  discussed  in  Section  V11.5. 

(VII. 2)  Weakly  Parallel  Tests 

Weakly  parallel  tests  have  been  introduced  (Samejima,  1977b) 
in  contrast  to  strongly  parallel  tests  in  the  context  of  latent  trait 
theory.  Two  tests  are  strongly  parallel  if: 

(1)  they  have  the  same  number  of  items,  and 

(2)  there  is  a  one-to-one  correspondence  of  each  item  on  the 
first  test  with  one  and  only  one  item  on  the  second  test  with 
respect  to  the  Identity  of  the  number  of  item  score 
categories  and  the  set  of  operating  characteristics  of 

item  score  categories. 

In  contrast  to  that,  weakly  parallel  teats  are  any  pair  of  tests 
measuring  the  same  ability  or  latent  trait  for  which  the  square 
roots  of  the  test  information  functions  are  identical.  Thus  two 
weakly  parallel  tests  may  have: 

(1)  different  numbers  of  items,  and 

(2)  no  one-to-one  correspondence  between  the  two  sets  of  test 
items  with  respect  to  the  number  of  item  score  categories 

or  to  the  sets  of  operating  characteristics  of  the  item  scores. 

It  has  been  pointed  out  (Samejima,  1977a)  that  in  tailored 
testing,  or  computerized  adaptive  testing,  any  number  of  weakly  parallel 
tests  can  be  made  by  prearranging  a  certain  amount  of  test  information 
and  using  it  as  the  criterion  in  terminating  the  presentation  of  items 
to  individual  subjects.  In  such  procedures  two  different  item  pools 
are  not  needed ,  although  two  item  pools  developed  for  measuring  the  same 


-139- 


VI 1-3 


ability  or  latent  trait  will  serve  just  as  well. 

(VII . 3)  Use  of  the  Amount  of  Teat  Information  as  the  Criterion 
for  Terminating  the  Presentation  of  New  Test  Items 

It  has  been  common  for  researchers  to  apply  a  certain  degree 
of  convergence  of  the  current  estimate  of  ability  obtained  after  each 
test  item  has  been  presented,  as  the  criterion  for  terminating  the 
presentation  of  new  items.  This  procedure,  however,  will  result  in 
producing  different  levels  of  accuracy  of  estimation  at  different 
levels  of  ability,  or  even  at  the  same  level  of  ability. 

For  the  purpose  of  illustration,  Figure  7-3-1  presents  10 


FIGURE  7-3-1 


Graphic  Presentation  of  tha  Chang#  of  the  Local  Maximus  Likelihood  Estimate  After  the 
Prasantation  of  Each  Ne»  Item  for  Each  of  Tan  Hypothetical  Exanimate.  Two  Saaalons 
Wart  Administer ad  to  Each  Examine*  Which  Are  Shown  by  Hollow  Circlaa  and  Solid 

Triangles  ,  Respectively. 


-140- 


VII-4 


graphs,  each  of  which  displays  the  process  of  convergence  of  the 
maximum  likelihood  estimate  in  a  simulated  tailored  testing 
situation.  The  ability  level  of  each  of  these  10  hypothetical 
examinees  is  shown  by  a  number  on  the  ordinate  and  the  horizontal 
line.  The  item  pool  used  for  this  simulation  study  consists  of  nine 
subsets  of  binary  test  items  following  the  normal  ogive  model,  whose 
discrimination  and  difficulty  parameters  are  shown  in  Table  7-3-1. 

TABLE  7-3-1 

It*»  Ei.criainatlcm  PiiiMti r,  ,  and  Itaa 

Difficulty  Faxaaatar,  ,  of  Each  of  the  Nine 

Croups  of  Binary  Taat  Itaas  Dead  aa  the  Itaa 
Fool  In  the  Slaulated  Tailored  Tasting. 


lt«a 

Group 

*8 

b8 

1 

1.20 

-2.00 

2 

1.60 

-1.50 

3 

2.00 

-1.00 

A 

1.40 

-0.30 

5 

1.80 

0.00 

6 

1.30 

0.50 

7 

1.70 

1.00 

8 

1.90 

1.50 

9 

1.50 

ii  i  '■  ■-  ...  i 

2.00 

It  is  assumed  that  each  subset  has  a  sufficiently  large  number  of 
equivalent  test  items.  There  are  two  sessions  for  each  examinee, 
which  are  marked  with  hollow  circles  and  solid  triangles  in  Figure 
7-3-1,  respectively.  For  each  examinee  in  each  session,  binary 
items  were  selected  and  presented  until  the  test  Information  at  the 
current  value  of  the  maximum  likelihood  estimate  had  reached  23.0  . 
Since  the  items  ars  binary,  no  local  maximum  likelihood  estimate  was 
obtained  after  administration  of  the  first  item.  For  Subject  1,  for 
instance,  in  the  first  session  the  first  local  maximum  likelihood 
estimate  was  given  after  administration  of  the  second  item;  and  in 
the  second  session  it  was  obtained  after  administration  of  the  fifth 
item.  It  is  clear  from  this  figure  that,  in  some  cases,  the  current 
maximum  likelihood  estimates  converged  well  before  the  test 


-141- 


VII- 


infonnation  reached  25.0  ,  whereas,  In  other  cases,  they  had  not 
converged  yet  by  the  time  the  test  Information  reached  25.0  . 

Consider,  for  example,  Subject  4  in  the  first  session  (hollow 
circles)  and  Subject  10  in  the  second  session  (solid  triangles).  If 
the  rule  is  made  that  the  presentation  of  new  items  is  to  be 
terminated  when  the  shift  of  the  current  maximum  likelihood  estimate 
is  less  than  0.07  twioe  in  succession,  then  this  will  occur  after 
the  presentation  of  the  9th  item  in  the  former  and  not  until  the 
presentation  of  the  14th  item  in  the  latter.  The  corresponding 
values  of  test  information  are  13.370  and  23.137  ,  respectively. 

Hie  standard  error  of  estimation,  which  is  the  inverse  of  the  square 
root  of  test  information,  is  0.273  in  the  former  and  0.208  in  the 
latter,  l.e.,  approximately  76  percent  of  0.273  .  On  the  other  hand, 
if  the  rule  is  made  that  the  presentation  of  new  items  is  to  be 
terminated  when  test  information  has  reached,  say,  25.0  ,  at  that 
current  maximum  likelihood  estimate,  as  was  the  case  here,  the 
standard  error  of  estimation  would  be  approximately  the  same  for  all 
the  examinees  of  different  ability  levels,  l.e,,  0.20  .  If  the 

estimation  of  each  examinee's  ability  with  the  same  level  of  accuracy 
is  desired,  there  will  be  no  doubt  that  the  second  rule  is  better 
than  the  first  rule. 

If  the  same  level  of  accuracy  of  estimation  is  unnecessary , 
as  in  selection,  it  will  be  possible  to  prearrange  a  desirable  test 
Information  function  which  is  not  constant  for  the  entire  range  of 
ability  in  question  but  has  a  specific  curve  for  the  specific  purpose. 
This  test  information  function  can  then  be  used  as  the  criterion  for 
terminating  the  presentation  of  new  items.  In  such  a  case,  examinees 
of  different  levels  of  ability  are  measured  with  different  levels  of 
accuracy  of  estimation  and  yet  the  resulting  selection  will  be 
conducted  as  accurately  as  is  desirable  if  the  appropriate 
information  curve  is  used. 

The  above  are  only  two  examples  of  many  possibilities.  In  any 
case,  the  use  of  test  information  functions  as  the  criterion  for 
terminating  the  presentation  of  new  items  in  tailored  testing  permits 


j 


i 


-142- 


VII- 


control  of  the  level  of  accuracy  of  estimation  to  serve  the  purposes 
of  testing;  it  is  impossible  to  do  so  if  the  convergence  of  the 
current  maximum  likelihood  estimate  is  used  as  the  criterion.  The 
adoption  of  the  test  information  function  as  the  criterion, 
therefore,  is  strongly  recommended,  rejecting  the  convergence  of 
the  current  maximum  likelihood  estimate,  which  makes  the  accuracy 
of  estimation  arbitrary. 

j 

A 

!  (VII, 4)  Test  Information  Function  and  Standard  Error  of  Estimation 

i 

!  One  of  the  many  advantages  of  latent  trait  theory  over 

I 

classical  test  theory  is  that  the  standard  error  of  measurement  is 
defined  more  meaningfully,  as  a  function  of  the  latent  trait  6  . 

I 

It  is  defined  as  the  inverse  of  the  square  root  of  the  test 
information  function,  and  is  most  meaningful  when  the  test 
,  information  function  assumes  a  high  enough  value  bo  that  the 

conditional  distribution  of  the  error  t  ,  given  0  ,  is 
i  approximately  normal.  When  a  prearranged  value  of  the  test 

i  information  function  is  used  as  the  criterion  for  terminating  the 

j  presentation  of  new  items  in  adaptive  testing ,  however ,  consideration 

|  must  be  given  to  the  relationship  between  the  test  information 

function  and  the  standard  error  of  estimation.  Figure  7-4-1  presents 
|  this  relationship, 

;  As  can  be  seen  in  this  figure,  the  latter  is  a  strictly 

i  decreasing  function  of  the  former;  yet  the  amount  of  decrement  in 

the  standard  error  of  estimation  is  conspicuous  for  the  initial 
j  increase  of  the  test  information  function.  It  is  more  or  less 

stabilized,  however,  after  the  test  information  function  reaches 
j  20.0  .  For  instance,  for  1(6)  ■  6.25  the  standard  error  of 

!  estimation  is  0.4  ;  this  becomes  0.2  r  i.e.  ,  one-half,  when 

i  1(6)  *  25.0  .  On  the  other  hand,  to  make  the  standard  error  of 

'  estimation  one-fourth  of  0.4  ,  i.e.  ,  0.1  ,  the  test  information 

;  must  be  100.0  .  This  suggests  that,  in  adaptive  testing,  we  must 

!  balance  the  increase  in  the  number  of  test  items  with  the  decrease  in 

|  the  standard  error  of  estimation,  and  find  out  a  suitable  criterion. 

\ 

4 


TEST  INFORMATION  FUNCTION  X(S) 


FIGURE  7-4-1 

functional  Ralationahlp  bctwaan  Teat  Information  function  and  Standard 

Error  of  Eattoatlon. 

(VII. 5)  Old  Test  for  Item  Calibration 

It  should  be  noted  that,  in  adaptive  testing,  we  can  prearrange 
the  target  square  root  of  test  information,  and  use  the  function  as 
the  criteria  for  terminating  the  presentation  of  new  items  to 
individual  examinees.  This  target  function  does  not  specify  a  single 
subtest  from  the  item  pool,  but  it  provides  us  with  a  set  of  different, 
individualized  subtests.  If  we  repeat  this  process,  we  will  obtain 
more  than  one  such  3et  of  individualized  subtests,  which  are  weakly 
parallel  to  one  another.  We  notice  that,  in  spite  of  this  difference, 
we  may  use  such  a  set,  or  sets,  of  subtests  as  our  Old  Test,  in 
estimating  the  operating  characteristics  of  the  discrete  item 
responses  to  new  test  items ,  with  the  prearranged  square  root  of  the 
test  information  function  for  the  interval  of  ability  of  our  Interest. 
This  is  a  remarkable  characteristic  of  the  approaches  and  methods 


developed  in  the  present  research,  when  they  are  applied  to  the 
adaptive  testing  situation. 

It  should  also  be  noted  that,  because  of  this  characteristic, 
there  is  no  need  for  us  to  transform  ability  0  to  t  ,  since  we 
can  prearrange  a  substantially  large  constant  value  for  the  target 
square  root  of  the  test  information  function  for  our  Old  Test.  The 
process  of  estimating  the  operating  characteristics  for  new  items 
becomes,  therefore,  much  more  simplified  than  the  one  we  must  use 
when  our  Old  Test  is  a  fixed  test,  since,  under  the  ordinary 
circumstances,  it  is  extremely  difficult  to  develop  a  fixed  test 
which  has  a  constant  amount  of  test  information  for  the  range  of 
ability  of  our  interest. 

(VII . 6)  Adaptive  Testing  Using  Graded  Test  Items 

With  the  consideration  described  in  earlier  sections ,  a 
hypothetical  tailored  testing  situation  was  constructed,  using  six 
different  item  pools.  The  first  item  pool  consists  of  eleven  types 
of  graded  items,  each  of  which  had  four  graded  item  score  categories. 
Each  item  follows  the  normal  ogive  model,  which  is  given  by  (3.6)  , 

and  the  three  difficulty  parameters,  b  for  x  »1,2,3  ,  for  each  of 

Xg  8 

the  eleven  types  of  graded  items  are  presented  in  Table  7-6-1.  The 


TABLE  7-6-1 


Thraa  Difficulty  Paraaatora  fox  Each  ol  the 
Klavan  Typaa  of  Cradad  Taat  Itaata  Which  Ara 
Canon  to  tha  Thraa  Diffaraut  Itaa  Pool*. 


Itaa 

xgml 

*  -  2 

8 

•gm* 

1 

MW 

-2.5 

-2.0 

2 

-2.0 

-1.5 

3 

-MHM 

-1.5 

-1.0 

4 

-1.3 

-1.0 

-0.5 

5 

-1.0 

-C.5 

0.0 

& 

-0.5 

IKI  91 

0.5 

7 

PH 

7a 

1.0 

8 

■  fl 

II 

1.5 

■n 

life™ 

2.0 

«Bk- 

B£M 

2.0 

2.5 

WBM 

2.0 

2.5 

3.0 

-145- 


VII-9 


discrimination  parameters,  a  ,  for  these  eleven  types  of  items  are 
uniformly  1.0  .  The  second  item  pool  also  has  eleven  types  of 
graded  items  with  the  same  number  of  item  score  categories  and  values 
of  the  difficulty  parameters,  but  the  common  discrimination  parameter, 

a  ,  is  2.0  instead  of  1.0  .  The  third  item  pool  is  the  same  as 
g 

the  first  and  the  second,  except  that  a^  ■  3.0  ,  The  other  3  item 
pools  are  identical  to  the  first  set  of  3  item  pools,  except  that 
the  items  are  binary  items  and  the  difficulty  parameters  are  those 
shown  in  the  column  indicated  as  x  ■  2  in  Table  7-6-1.  It  is 

s 

assumed  that  in  each  item  pool,  there  are  a  substantially  large  number 
of  items  of  each  type. 

The  criterion  square  root  of  the  test  information  was  set  as 
1/2 

[l(6)]  '  «  4.65  ,  the  same  constant  which  was  used  in  our  original 

Old  Test.  This  value  can  also  be  considered  as  the  reasonable 
compromise  suggested  in  Section  VII. 4  .  The  standard  error  of 
estimation  is  approximately  0.215  .  One  hundred  hypothetical 
subjects  were  used  in  each  tailored  testing  situation.  Their  ability 
levels  are  -2.475  through  2.475  with  an  interval  of  0.05  ,  i.e., 
the  same  set  of  one  hundred  ability  levels  as  we  used  before  (cf. 
Section  III. 3).  In  each  pair  of  adaptive  testing  situations  in  which 
the  same  discrimination  parameter  was  used,  the  same  seed  number  was 
used  to  produce  the  same  sequence  of  randan  numbers.  The  first  item 
presented  to  every  subject  was  item  6,  which  is  the  item  with 
intermediate  difficulty.  If  the  subject’ s  item  score  was  0  ,  then  the 
easiest  item,  item  1,  was  presented  repeatedly  until  an  item  score 
other  than  0  was  obtained.  If  the  subject’s  score  on  item  6  was  4  , 
then  the  most  difficult  item,  item  11,  was  presented  repeatedly  until 
an  item  score  other  than  4  was  obtained.  After  that,  the  tentative 
maximum  likelihood  estimate  was  computed,  and  the  computer  presented 
an  item  for  which  the  amount  of  test  Information  was  greatest  at  that 
value  of  6  .  This  process  was  repeated  until  the  square  root  of  the 
test  information  function  at  the  current  maximum  likelihood  estimate 
reached  the  criterion,  4.65  . 


-146 


Vll-10 


Tables  7-6-2  through  7-6-4  present  the  frequency  distributions 

of  the  number  of  items  needed  for  the  hypothetical  tailored  testing 

for  individual  subjects  with  the  criterion  4.65  in  each  of  the  two 

situations*  for  a^  ■  1.0  ,2.0  and  3.0  ,  respectively.  A  substantial 

difference  between  the  two  frequency  distributions  are  observed.  The 

mean  number  of  items  is  36.92  for  the  binary  case  and  27.98  for 

the  graded  case  for  a  -  1.0  ,  indicating  that  only  75.8  percent  of 

s 

the  items  were  necessary  in  the  graded  case  as  compared  to  the  binary 

case.  These  numbers  are  11.97  and  7.88  for  the  cases  where 

a„  ■  2.0  ,  and  7.38  and  4.56  where  a  ■  3.0  ,  and  the 
8  8 
corresponding  percentages  are  65.8  and  61.8  for  these  two  pairs, 

respectively.  This  result  indicates  the  high  efficiency  of  the 

graded  test  items  in  adaptive  testing,  in  preference  to  binary  test 

items.  This  is  especially  true  when  we  have  large  values  for  the 

discrimination  parameters. 


TABLE  7-6-2 


Fraquancy  Distribution  of  ths  Huabsr  of 
Itsna  Ussd  In  Hypothatical  Tailored 
Tasting,  a  -  1.0  . 


Munbar 
of  Itana 

Binary 

Cradad 

27 

15 

28 

74 

29 

30 

10 

31 

32 

33 

34 

1 

35 

1 

36 

48 

37 

31 

38 

9 

39 

4 

40 

4 

41 

2 

42 

i 

Total 

100 

100 

Mann 

36.92 

27.98 

-147- 


V1I-11 


TABLE  7-6-3 


TABLE  7-6-4 


Frequency  Distribution  of  tha  Muabar  of 
Itaaa  Ussd  In  Hypothetical  Tailored 
looting,  a  “2,0  . 


Binary 

Gradad 

7 

21 

8 

70 

9 

9 

10 

1 

11 

26 

12 

54 

13 

10 

14 

3 

15 

1 

16 

17 

18 

1 

Total 

96 

Moan 

Frequency  Distribution  of  the  ttuabar  of 
Iteaa  Used  in  Hypothetical  Tailored 
Tooting,  a  “  3.0  . 


Nuabar 
of  ItaBB 

Binary 

Gradad 

3 

16 

4 

15 

5 

2 

66 

6 

2 

3 

7 

55 

8 

36 

9 

4 

Total 

99 

100 

Maan 

7.38 

4.56 

(VII. 7)  Bayesian  vs.  Maximum  Likelihood  Estimation  in  Adaptive 
Testing 

As  we  have  observed  in  Sections  VI. 1  and  VI. 2,  the  use  of 
a  prior  in  ability  estimation  provides  us  with  biases  which  we  may 
wish  to  avoid. 

In  adaptive  testing,  it  is  typical  for  researchers  to  use  a 
normal  density  function  as  the  prior.  Figure  7-7-1  presents  four 
functions,  i.e.,  the  standard  normal  density  function,  n(0, 1) 

(solid  line),  and  three  approximations  to  n(0,l)  .  Each  of  these 
three  approximations  is  the  product  of  two  functions,  1^(6)  ®nd 
t 1— P .(6)3  ,  which  are  given  by  the  normal  ogive  functions  such  that 

j 

f  a-t  (  8-b.j  ) 

e~uz/2  du 


(  7.1) 


PiCe)  - 


/2? 


I. 


and 


LATENT  TRAST  • 


?•  !  FIGURE  7-7-1 

f  ■. . 

Bi  i  Comparison  of  Thraa  Approximations  with,  tha  Normal  Dansity 

i  Function,  n(0,l)  (Solid  Lina).  Those  Approximations  Arm  tha 

'  Products  of  a  Normal  Ogive  Function  and  Another  Subtracted 

I  From  Unity,  Which  bqual  n(0,l)  at  8  -  0.1  (Dotted  line), 

;;  0  ■  0.6  (Broken  Line)  and  6  ■  0.9  (Dashed  Line), 

j  Respectively. 

[  ' 


/a, (0-b .) 

(7.2)  P.(9)  -  JL  I  e“U2/2  du 

J  ^2 7  J 

f  -» 

where  and  «  -bj  .  These  two  parameters,  a^  and  b.^  , 

are  0.94810  and  -0.35454  for  the  function  drawn  by  a  dotted  line 
in  Figure  7-7-1,  0.94980  and  -0.35391  for  the  one  drawn  by  a 
broken  or  long,  dashed  line,  and  0.95259  and  -0.35287  for  the 
one  drawn  by  a  short,  dashed  line,  respectively.  These  three 
approximations  are  obtained  by  setting  the  product  of  the  two 
functions  equal  to  the  standard  normal  density  function  at  3*0.3  , 

6  *  0.6  and  0  -  0.9  ,  respectively,  in  addition  to  0  *  0.0  .  We 
notice  that  these  four  curves,  including  n(0,l)  ,  in  Figure  7-7-1 
ara  practically  indistinguishable. 

We  notice  that  the  formulas  in  (7.1)  and  (7.2)  are  identical 
with  the  item  characteristic  function  in  the  normal  ogive  model.  This 
implies  that  the  prior,  n(0,l)  ,  is  practically  the  same  as  the 
product  of  the  two  operating  characteristics  of  the  hypothetical 
binary  items,  i  and  j  ,  for  the  response  pattern,  (1,0)  .  The 
Bayes  modal  estimator  with  the  prior  n(0,l)  can  be  considered, 
therefore,  as  the  maximum  likelihood  estimator,  obtained  from  the 


response  pattern  V  plus  two  additional  responses,  1  and  0  , 
to  the  hypothetical  binary  items,  i  and  j  .  Note  that  these  two 
additional  Item  responses  are  always  1  and  0  ,  regardless  of  the 
true  ability  level. 

In  order  to  observe  how  the  prior  affects  the  resultant  ability 

estimation  in  adaptive  testing,  a  simulation  study  was  conducted 

(RR-80-3)  by  using  the  hypothetical  item  pool,  which  was  described  in 

Section  VII. 3  .  We  assume  eleven  hypothetical  examinees,  whose  ability 

levels  are  -2.25  ,  -1.75  ,  -1.25  ,  -0.75  ,  -0.25  ,  0.00  ,  0.50  , 

1.00  ,  1.50  ,  2.00  and  2.50  ,  respectively.  We  also  assume  four 

different  situations,  in  one  of  which  the  maximum  likelihood 

estimation  is  applied  for  the  ability  estimation,  and  in  the  other 

three  Bayes  modal  estimation  is  used,  with  three  different  priors, 

n(0. 0,1.0)  ,  n(0.0,0.8)  and  n(0.0,0.5)  ,  respectively.  In  the 

first  situation  of  maximum  likelihood  estimation,  an  item  from  group 

5  is  always  chosen  as  the  first  item  to  present  to  an  examinee,  and, 

depending  upon  the  examinee’s  response  to  this  item,  the  second  item 

is  chosen  either  from  group  1  or  group  9.  That  ia  to  say,  if  the 

examinee's  response  to  the  first  item  is  correct,  then  the  second  item 

is  chosen  from  group  9,  i.e.,  the  most  difficult  item  group,  and,  if 

it  is  incorrect,  then  the  second  item  is  chosen  from  group  1,  the 

easiest  item  group.  The  examinee  will  stay  with  the  same  item  group 

for  the  following  items,  until  he  fails  in  answering  an  item  correctly 

if  it  is  group  9,  and  until  he  succeeds  in  answering  an  item 

correctly  if  it  is  group  1.  Thereafter,  since  every  current 

likelihood  function  has  a  local  maximum,  an  item  from  the  item  group 

whose  item  information  function,  I  (0)  ,  which  is  defined  by  (3.9)  , 

8 

is  the  greatest  at  the  value  of  current  maximum  likelihood  estimate 
is  chosen  and  presented  next,  and  this  will  go  on  until  the  amount  of 
test  information  at  the  current  maximum  likelihood  estimate  reaches 
or  exceeds  a  certain  criterion.  All  the  responses  of  the 
hypothetical  examinees  are  calibrated  by  the  Monte  Carlo  method. 

In  Bayesian  estimation,  the  first  estimate  is  the  modal  point 
of  the  prior.  The  second  item  is  an  item  chosen  from  the  item  group 


whose  item  information  function,  1  (0)  ,  1h  the  grenlest  at  the 

8 

modal  point  of  the  prior,  and  the  third  item  is  from  the  item  group 
whose  item  information  function  is  the  greatest  at  the  current 
Bayes  modal  estimate,  and  so  forth,  and  the  presentation  of  a  new 
item  is  terminated  when  the  amount  of  test  information  at  the 
current  estimate  of  the  examinee's  ability  has  reached  the  same 
criterion  used  in  the  maximum  likelihood  estimation. 

Figure  7-7-2  presents  the  results  of  these  two  ability 
estimations,  which  were  obtained  by  using  the  prior,  n(0.0,0.8)  , 
and  without  using  any  priors,  by  solid  circles  and  solid  triangles, 
respectively.  In  this  figure,  6  ■  0.0  ,  and  the  prior  did  not 
interfere  with  the  convergence  of  the  ability  estimate.  In  contrast 


FIGURE  7-7-2 


Succaealv*  Maxlaua  Likelihood  EitiMtu  (Triaajlae)  and  Bay,,  Modal  Eatlaataa 
(Circle*)  in  the  Simulated  Tailored  Teatlaj  with  n(0.0,0.8)  ee  the  Prior  for 
a  Hypothetical  Kxealaee  Whoa*  Ability  Laval  la  0.00  . 

to  this  result,  Figure  7-7-3  presents  another  case  in  which  8  ■  -2.25 
In  this  figure,  substantial  differences  between  the  two  processes  of 
the  maximum  likelihood  estimation  and  the  Bayes  modal  estimation  are 
observed,  in  the  latter  of  which  the  convergence  is  much  slower, 
fighting  off  the  effect  of  the  prior.  These  two  examples  typically 
illustrate  the  bias  caused  by  the.  prior. 


OSEBSaSEESES 


-152- 


V1I1-1 


VIII  Constant  Information  Model 

Researchers  get  interested  in  finding  out  what  kind  of  test 
item  provides  us  with  a  larger  amount  of  information  in  comparison 
with  others.  They  seldom  pay  attention,  however,  to  the  fact  that 
there  exists  some  constancy  in  the  amount  of  item  information.  In 
this  chapter,  we  shall  observe  such  aspects  of  information  functions, 
introduce  a  new  model  for  the  binary  test  item,  which  is  called 
Constant  Information  Model,  and  discuss  its  practical  implications 
and  usefulness  in  the  estimation  of  the  operating  characteristics 
of  discrete  item  responses. 

(VIII. 1)  Constancy  of  Information  under  the  Transformation  of  the 
Latent  Trait 

Let  t  be  any  strictly  Increasing  transformation  of  ability 
6  .  The  relationships  between  the  two  sets  of  information 

functions,  i.e.,  I  (6)  ,  1.(8)  ,  Iv(6)  and  1(0)  versus 

g  J 

I*  (t)  ,  I*(t)  ,  IA(t)  and  I*(t)  »  have  been  given  in  Section 

g  8 

III. 8  ,  while  the  original  definitions  of  the  first  set  of 
information  functions  are  given  in  Section  III. 4  .  It  should  be 
noted  that  the  area  under  the  curve  of  the  item  information  function, 
and  that  of  the  test  information  function,  do  change  with  the 
transformation  of  ability  9  to  t  ,  since  there  are  such 
relationships  that 


(8.1) 


and 


(8.2) 


f  I*(t)  dt  «  [  I  (9)  4^ 

)  1  &  J  6  8  dT 


de  , 


T  (Q 

I*(t)  dr  ■  J 
•t  J  e 


1(6)  dT 
OT 


where  0  and  0  are  the  lower  and  upper  endpoints  of  the  range  of 
8  and 


-153- 


VIII-2 


(8.3) 


f  l  m  t (6) 
\x~  t(8) 


are  those  of  the  range  of  the  transformed  variable  t  . 

If  we  consider  the  integration  of  the  square  root  of  each 
information  function,  however,  we  obtain 


(8.4) 

and 

(8.5) 


c 


[i*(t)]1/2  dr  «  t  [i  (e)]x/z  de  , 

O  I  Q  © 


c 


|l/2 


i: 


[I*(x)]1/2  dt  “  \  [i(e)]A/<;  d6  . 


i; 


il/2 


Thus  the  area  under  the  curve  of  the  square  root  of  the  item 
information  function,  and  that  of  the  test  information  function,  are 
unchanged  throughout  the  transformation  of  the  latent  trait  by  any 
strictly  increasing  function,  t(6)  . 

We  recall  that  ability  8  was  transformed  to  t  by  the 
polynomial  given  by  (5.13)  when  we  used  one  of  the  nine  subtests 
of  the  original  Old  Test,  i.e.,  Subtests  1  through  9,  as  our  Old 
Test  (cf.  Section  VI).  The  above  fact  implies  that,  in  so  doing, 
the  totality  of  the  square  root  of  the  test  information  function 
of  our  Old  Test  was  kept  constant. 

(VIII. 2)  Constancy  of  Item  Information  for  ja  Specified  Model 

The  finding  in  the  preceding  section  can  be  generalized  further 
to  the  constancy  of  the  square  root  of  the  item  information  function 
for  items  which  follow  the  same  model,  as  long  as  the  set  of  operating 
characteristics  for  an  arbitrarily  selected  test  item  which  belongs 
to  the  model  can  produce  one  for  any  other  test  item  which  follows  the 
same  model.  To  give  an  example,  suppose  that  item  g  has  an  item 
characteristic  function  in  the  normal  ogive  model,  such  that 


-154- 


VIII-3 


(8.6) 


Pg(6) 


M 


-1/2 


/: 


a  (6-b  ) 

8  8  exp[-t2/2]  dt 


where  a  (>0)  and  b  are  the  item  discrimination  and  difficulty 
8  8 

parameters,  respectively.  Suppose  that  we  wish  to  transform  ability 
6  to  t  by  the  linear  transformation  such  that 


(8.7)  T  -  ag(6“bg)  a*-1  +  b*  , 


where  a*  is  an  arbitrary  positive  constant  and 
We  can  write  for  the  item  characteristic  function, 
g  resulting  from  the  transformation  of  6  to  t 


b* 

g 


is  any  constant, 
of  item 


?*(r) 


(8.8) 


p*(T) 


a*(x-b*) 

g  8  exp[-t2/2]  dt  . 


It  is  obvious  that  P*(t)  thus  obtained  belongs  to  the  normal  ogive 

8 

model.  From  the  finding  obtained  in  the  preceding  section,  therefore, 
the  constancy  holds  for  the  totality  of  the  square  root  of  the  item 
information  function  over  the  transformation  of  6  to  T  .  Note  that 
this  is  true  for  any  arbitrarily  chosen  values  for  a*  and  b*  ,  as 

o  O 

long  as  a*  is  positive.  Let  h  be  any  other  binary  test  item 
8 

which  also  follows  the  normal  ogive  model.  We  can  write 
(8.9)  Ph(6)  -  [2*r1/2  j  ^  h)  exp[-tz/2]  dt  . 

If  we  set  a*  -  a^  and  b*  ■  b^  ,  then  (8.8)  provides  us  with  an 

identical  curve  with  that  of  (8.9)  .  The  area  under  the  square  root 

1/2 

of  the  item  information  function,  [l*(t)]  '  ,  therefore,  will  equal 

1  try  E 

that  of  [i,  (8)3 ^  .  The  constancy  of  item  information  holds  over 

n 

any  binary  test  items  which  belong  to  the  same  model,  i.e.,  the  normal 
ogive  model. 

For  the  purpose  of  illustration,  Figures  8-2-1  and  8-2-2 
present  the  item  information  functions  and  their  square  roots  for 
three  items,  all  of  which  belong  to  the  normal  ogive  model  with 


I 


FIGURE  8-2-1 

Item  Information  Functlona  of  Threo  Binary  I  tana ,  Which  Follow  tha  Nomal  Oglva 
Modal,  with  tha  Cannon  Difficulty  Parameter,  kj  •  bj  •  bj  •  0.0  ,  and  tha 

Diacrinlnatlon  Parcnetara,  »1  -  1.0  (Solid  Curva),  «2  -  2.0  (Dotted 
Curve)  and  a3  ■  3.0  (Daahed  Curve),  Baapectivaly. 


Square  Roote  of  tha  Itan  Infonutlon  Functlona  of  three  Unary  Xtew,  Which 
Follow  tha  Morwal  Ogive  Modal,  with  tha  Cowon  Difficulty  ParaMtar, 
bl"b2"‘b3“0’0  »  *nd  th*  Diacrinlnatlon  Parana  taro,  a3  -  1.0 

(Solid  Curve),  a2  •  2.0  (Dotted  Curva)  and  a.  ■  3.0  (Daahad 
Curve),  Baapactivaly.  3 


I 

'{ 

1 

y 


j 


-156- 


VIII-5 


*  1.0  ,  a2  «*  2.0  and  *  3.0  ,  and  b^  -  b2  “  «  0.0  , 

respectively.  We  can  see  that  in  Figure  8-2-1  the  three  areas  are 
substantially  different  from  one  another,  while  those  in  Figure  8-8-2 
are  equal. 

(VIII. 3)  Constancy  of  Item  Information  for  Set  of  Models 

In  this  section,  we  only  consider  binary  test  items.  Consider 
:a  set  of  test  items  which  follow  different  models,  but  whose  item 
characteristic  functions  are  strictly  increasing  in  6  ,  and  satisfy 


(8.10) 


$8  Pg<«  -  0 

y «  • 1  • 


Let  h  denote  another,  arbitrarily  chosen  test  item  which  follows 
a  different  model,  which  satisfies  the  above  two  conditions.  The 
transformation  of  0  to  x  in  such  a  way  that 


(8.11)  T  =  Ph-1[Pg(8)] 

provides  us  with  the  item  characteristic  function,  P*(t)  ,  for  item 
g  with  respect  to  the  transformed  latent  trait  r  ,  which  is 
identical  with  P^C0)  •  The  constancy  of  item  information  holds, 
therefore,  for  item  g  and  item  h  on  the  ability  scale  8  ,  in 
spite  of  the  fact  that  they  belong  to  different  models. 

Figure  8-3-1  illustrates  the  square  roots  of  the  item 
information  functions  of  three  binary  test  items,  g  ,  h  and  j  , 
which  follow  the  normal  ogive  model,  the  logistic  model  and  the 
linear  model,  respectively.  The  item  characteristic  functions  of 
item  h  and  j  are  given  as  follows. 


(8.12)  Ph(0)  -  [1  +  exp{-Dah(8-bh)}]_1  -»  <  0  <  »  . 

(8.13)  P.  (6)  -  (fl-o4)(0,-a.)“1  a.  <  6  <  6,  . 

j  j  j  j  j  3 


-157- 


VIII-6 


FIGURE  8-3-1 

Square  Root*  of  the  Item  Information  Function*  of  Itaaa  g,  h  and  j ,  Which  Follow 
tha  Normal  Ogive  Model  with  «g  «  1.0  and  bg  -  0.0  (Dotted  Curve),  the  Logistic 

Model  with  t  -  1.7  ,  a^  “  1.0  and  bh  -  0.0  (Solid  Curve)  and  the  Linear 
Model  with  a ^  -  -2.5  and  B^  ■  2.5  (Dashed  Curve). 

The  reader  is  directed  to  Chapter  3  of  RR-79-1  for  the  relationships 
among  these  three  models. 

It  should  be  noted  that  the  same  principle  holds  for  any  other 
sets  of  models,  each  of  which  has  common  characteristics  of  its  own, 
as  the  present  set  of  models  has  the  strictly  increasing  property  in 
item  characteristic  functions  and  the  satisfaction  of  (8.10)  .  It 
will  be  improper,  however,  to  consider  a  set  of  models  for  which  the 
item  information  function  is  meaningless,  like  the  type  of  the 
three-parameter  logistic  or  normal  ogive  models,  for  the  reason  the 
author  has  pointed  out  (Samejima,  1973). 

(VIII. 4)  Exact  Area  under  the  Square  Root  of  the  Item  Information 
Function 

We  notice  that  the  common  area  under  the  square  root  of  the 

item  information  function  for  all  the  binary  test  items,  whose  item 

characteristic  functions  are  strictly  increasing  in  6  and  satisfy 

(8,10)  ,  ^an  be  obtained  by  integrating  [i  (0)]^  for  any 

K 

arbitrarily  chosen  item  g  .  This  area  equals  tr  ,  or  approximately 
3.14159  .  The  following  process  is  an  example,  in  which  the 


-158- 


VIII-7 


logistic  model  has  been  chosen, 


(8.14) 

^  [ye)]1/2  de  -  Dah 

(8.15) 

6*  “  [exp{Da(6-b^) J]1^2 

(8.16) 

■ 2  mV1  e*'1  • 

(8.17) 

f  [ih<e)]1/2  de  -  D.J, 

1 1/2 


[l  4  exp^a^e-b^}]"1  d6 


6*(l+e*2)  2(Dah)”16*”1  d6* 


,r 
/  0 


2  \  (1+6*2)”1  dS*  -  2  tan”1 8* 


It  will  be  just  as  easy  to  demonstrate  it  if  we  choose  the  linear 
model  instead  of  the  logistic  model  (cf.  Chapter  A,  RR-79-1) . 

(VIII. 5)  Constant  Information  Model 

To  represent  the  type  of  models  which  satisfy  the  two  conditions 
described  in  Section  VIII. 3  ,  we  shall  consider  a  model  which  provides 
us  with  a  constant  value  for  the  square  root  of  the  item  information 
function  for  the  interval  of  6  ,  [6,6]  .  Let  g  denote  such  a 

binary  test  item.  It  is  obvious  that  the  internal,  [6,6]  ,  is  a 
finite  interval,  since  the  area  of  the  rectangle  given  by  this  interval 
and  the  constant  square  root  of  the  item  information  function,  C  ,  is 
a  finite  value,  i.e.,  it  .  Thus  we  can  write 

(8.18)  6  -  8  -  irC-1  . 

Thus  the  length  of  the  interval  of  0  depends  upon  the  constant  item 
information  C  . 

We  find  that  the  model  described  by 
P  (6)  -  sln2[a  (0-b  )  +  (tt/4)] 

6  ©  o 


(8.19) 


-159- 


VIII-8 


is  the  one  we  have  looked  for,  if  we  set  the  parameter 


a 

g 


such  that 


(8.20)  a  «  C/2  , 

S 

with  the  range  of  6  such  that 

(8.21)  [-it  a  _1/4]  +  b  $  6  *  C»a  “1/4]  +  b 

o  ©  o  o 


Since  we  have 


(8.22)  Q  (0)  -  1  -  P  (6)  -  cos2[a  (6-b  '  +  (ir/4)]  , 

o  o  o  © 

and 

(8.23)  -  Pg(6)  -  2  sin  [ag(0-bg)  +  (it /A)]* 

cos  [a  (0-b  )  +  (ir/4)] *a 

O  o  o 

*  2  a  [P  (6)Q  (6)]1/2 

6  6  © 

-  C  [P  ( 8)Q  (0) ]1/2 
©  8 

we  obtain 

(8.24)  Ig<0)  *  lj$  Pg(0)]2[Pg(6)Qg(e)]"1  -  C2  . 


We  can  see  from  (8.19)  that  this  model  provides  us  with  point 

symmetric  item  characteristic  functions  with  (b  ,  0.5)  as  the  point 

g 

of  symmetry,  just  like  the  normal  ogive  model,  the  logistic  model  and 

the  linear  model.  The  parameter  b  can  be  called,  therefore, 

8 

difficulty  parameter,  just  as  in  the  normal  ogive  and  logistic  models. 
It  is  obvious  from  (8.23)  that  the  parameter  ag  is  proportional 
to)  the  slope  of  the  line  tangent  to  P  (6)  at  6-b  ,  just  as  in 

these  two  models,  so  it  can  be  called  discrimination  parameter.  The 
meaning  of  this  parameter  is  more  obvious  in  (8.20),  i.e.,  the  fact 


-160- 


VIII-9 


that  the  amount  of  item  information  solely  depends  upon  the  parameter 

a  . 

8 

We  shall  call  this  model,  which  is  presented  by  (8.19),  the 
Constant  Information  Model.  This  model  has  an  important  role  in  the 
estimation  of  the  operating  characteristics  of  item  response 
categories,  which  will  bt  described  in  the  following  section. 

Figure  8-5-1  presents  a  few  examples  of  the  item  characteristic 


UTCNT  TRAIT  • 

FIGURE  8-5-1 


Itaa  Characteristic  Functions  (upp«  Graph)  end  tha  It*»  Information  Function* 
(Lower  Graph)  of  Fiva  Binary  leans  Following  tha  Constant  Information  Modal. 
Tha  Itan  Peraasttra  Arc:  a^  ■  0.23  and  bj,  »  0.00  (Saullar  Dots), 

*2  -  0.30  and  bj  -  0.30  (Shortar  Dashas),  a}  ■  0.73  and 
bj  -  2.00  (Larger  Dot*) ,  a^  "  1.0  and  b^  •  -1.5  (Longa r 
Dashas),  and  a}  »  2.00  and  bj  *  0.30  (Solid  Lina). 


V.ji!  &  Ui.'  J 


-161- 


VIII-10 


function  of  the  Constant  Information  Model,  together  with  the 
corresponding  item  information  functions. 

The  item  response  information  function,  I  (6)  ,  in  the 

g 

Constant  Information  Model  can  be  written  as 


(8.25) 


2a  2  sec2Ia  (9-b  )+(ir/4)] 
g  g  g 

2a  2  csc2[a  (9-b  )+(tt/4>] 
g  g® 


2a  2[Q  (9)]"1  >  0 

8  g  for  x  * 
g 


2a  2 [P  (6) )_1  >  o 
®  8  for  x  » 

g 


0 

1  . 


Figure  8-5-2  illustrates  these  two  item  response  information  functions 

for  an  item  with  the  parameters,  a  -  0.25  and  b  *  0.00  ,  together 

g  g 

with  the  constant  item  information  function  (»*  0.25)  .  From  (3.12) 


FIGURE  8-5-2 

Itm  Uaponaa  Information  function*  of  on  Xtra  following  tha  Constant 
Information  Modal,  with  tha  faraaatara,  af  •  0.25  and  b  •  0.00  , 

for  xg  «•  0  (Dotted  Curva)  and  for  x^  -  1  (Solid  Curva),  Tegathar 

with  tba  Constant  Stas  Information  function  (Oaahad  Curva). 


-162- 


VIII-11 


and  (8.25)  we  can  write  for  the  response  pattern  information  function 

(8.26)  Iv(0)  -  2  E  a  2[P  (6)]  Q0 ( 6> l^8  1  , 

x  eV  8  8  8 

8 

and,  finally,  the  test  Information  function  is  given  by 

n 

(8.27)  1(6)  -  4  E  a  2  . 

■8-1  8 

(V111.6)  Use  of  Constant  Information  Model  for  a  Set  of  Equivalent 
Test  Items  Which  Substitutes  for  the  Old  Test 

In  our  combinations  of  a  method  and  an  approach,  we  need  Old 
Test,  or  a  set  of  test  items  whose  operating  charactersitics  are 
known  (cf.  Chapter  3).  In  some  situations,  however,  we  may  lift 
this  restriction ,  with  the  effective  use  of  Constant  Information 
Model, 

Suppose,  for  developing  the  new  item  pool,  a  substantial 
number  of  test  items  are  administered  to  a  substantial  number  of 
examinees,  and  there  exists  a  subset  of  equivalent  binary  items 
among  these  items.  In  this  situation,  we  can  use  this  subset  of 
items  as  the  substitute  for  the  Old  Test. 

It  has  been  shown  by  Birnboum  (Birnbaum,  1968)  that,  when 
the  test  consists  of  n  equivalent,  binary  Items,  the  simple  test 
score  t  ,  which  is  the  sum  total  of  the  n  binary  item  scores,  is 
a  minimal  sufficient  statistic  for  the  response  pattern  V  .  In 
such  a  case,  we  have 

(8.28)  t  -  nPe(6)  , 

O 

and  the  maximum  likelihood  estimate  6  is  given  by 

(8.29)  6  -  P~X(t/n)  , 

& 

When  this  common  item  characteristic  function  follows  the 


— Tr-finriiiniiifrtBn£ 


-163- 


VIII-12 


Constant  Information  Model,  we  obtain  from  (8.19)  and  (8.29) 

(8.30)  0  -  a  "1  [sin,‘1(t/n)1/2r(.'!t/4)3  +  b  . 

8  8 

It  is  obvious  from  (8,21)  and  (8.30)  that  the  range  of  0  is 
given  by 

(8.31)  [-ira  “X/4]  +  b  (  M  [w  “1/4]  4-  b  . 

8  8  8  8 

We  assume  that  these  equivalent  items  have  a  strictly  increasing 

Item  characteristic  function  with  0  and  1  as  its  two  asymptotes. 

As  we  have  seen  in  previous  sections,  we  can  adjust  the  latent  trait 

scale  in  such  a  way  that  the  resulting  common  item  characteristic 

function  for  these  equivalent  items  follow  the  Constant  Information 

Model,  which  is  given  by  (8.19)  .  Then  the  response  pattern  of  each 

examinee  with  respect  to  the  subset  of  equivalent  binary  items  is 

specified,  and  is  summarized  in  the  form  of  teat  score.  The  origin 

and  unit  of  the  latent  trait  are  set  more  or  less  arbitrarily,  say, 

a  ■0.23  and  b  •  0.00  .  From  the  test  score  of  the  subset  of 
8  g 

equivalent  binary  items,  the  maximum  likelihood  estimate  of  the 
examinee’s  ability  is  obtained  through  (8.30)  .  The  resulting  set 
of  the  maximum  likelihood  estimates  for  all  the  examinees  can  be  used 
in  the  same  way  as  we  use  the  set  of  maximum  likelihood  estimates 
obtained  from  the  results  of  the  Old  Test.  The  operating  characteristics 
of  each  of  the  other  items  can  be  estimated  in  the  same  way  as  we  do 
when  we  use  the  Old  Test.  After  this  has  been  done,  we  can  transform 
the  latent  trait  in  whatever  way  we  wish. 

(VIII. 7)  How  to  Defect  a  Subset  of  Equivalent  Binary  Items 

A  natural  question  is  how  to  detect  a  subset  of  equivalent 
binary  items  out  of  the  tentative  item  pool.  In  empirical  sciences, 
it  is  often  difficult  to  obtain  a  sufficient  evidence.  The  second 
best  way  will  be,  therefore,  to  formulate  a  set  of  necessary  evidences, 
and  to  check  our  data  with  respect  to  each  criterion.  If  we  find 


-164- 


VII 1-13 


out  that  our  date  satisfy  all  the  necessary  conditions  thus  formulated, 
then  we  can  assume  that  we  have  obtained  what  we  wanted,  until  another 
necessary  criterion  becomes  available  and  our  data  fail  to  satisfy  it. 

In  our  situation,  first  of  all,  it  is  necessary,  though  not 
sufficient,  that  the  proportions  correct  should  be  the  same  value  for 
all  the  equivalent  binary  items,  with  the  allowance  of  sampling 
fluctuations.  This  can  be  checked  easily,  and  we  can  find  out  a  group 
of  binary  items  which  satisfy  this  condition,  if  there  is  any.  It  is 
also  necessary  that  the  2x2  contingency  tables  of  the  bivariate 
frequency  distributions  should  be  symmetric  and  identical  among  all 
the  pairs  of  equivalent  binary  items,  within  the  allowance  of  sampling 
fluctuations.  This  can  be  checked  for  every  pair  of  binary  items 
which  have  passed  the  first  selection,  and,  possibly,  some  items  have 
to  be  dropped.  We  can  go  ahead  to  the  2 3  contingency  tables  after 
this  step,  to  the  2**  contingency  tables,  etc.,  if  we  wish. 

Unlike  the  common  belief  in  high  discrimination  power,  it  is 
desirable  that  these  equivalent  items  have  a  low  common  discrimination,  in 
addition  to  being  substantial  in  number.  A  necessary  condition  for  this 
is  that  the  two  frequencies  for  the  response  patterns  (0,1)  and 
(1,0)  ,  which  are,  theoretically,  the  same  value  if  the  two  items  are 
equivalent,  should  be  large,  or  compatible  to  the  other  two.  This  can 
be  checked,  therefore,  in  the  same  process  for  checking  the  equivalency 
of  the  binary  items.  Table  8-7-1  illustrates  two  typical  2x2 


Low  Discrimination  Perimeter 


^"'Vsv>ltem  h 
Item  r\ 

i 

o 

Total 

x  -  0 

8 

110 

243 

353 

x  -  1 

8 

248 

399 

647 

Total 

358 

642 

1000 

High  Discrimination  Paramatar 


^'s‘"vsIt#m  h 
Itae  *Ss>sw 

*b-  0 

rH 

f 

Total 

x.-0 

300 

53 

353 

V1 

56 

589 

647 

Total 

358 

_642 

1000 

TABLE  8-7-1 

Two  Typical  2  a  2  Contingency  Tables  (or  a  Pair  of 
Equivalent  Items  with  e  Coemon  Low  Dlecrlmination 
Parameter,  and  (or  Those  with  a  Coemon  High 
Discrimination  Parameter,  Respectively 


-165- 


vm-iA 


contingency  tables,  one  of  which  is  for  a  pair  of  equivalent  binary 
items  which  have  a  common  low  discrimination  parameter,  and  the  other 
is  for  a  pair  of  those  which  have  a  common  high  discrimination 
parameter. 

(VIII. 8)  Convergence  of  the  Conditional  Distribution  of  the 

Maximum  Likelihood  Estimate  to  the  Asymptotic  Normality 
When  _a  Test  Consists  of  Equivalent  Items 


In  using  the  generalized  method,  we  should  be  awave  of  a  few 
problems.  First  of  all,  the  constant  test  information  provided  by 
the  subset  of  equivalent  binary  items  following  Constant  Information 
Model  should  be  substantially  large,  so  that  the  normal  approximation 
for  the  conditional  distribution  of  §  ,  given  0  ,  should  be 
acceptable.  On  the  other  hand,  we  need  a  substantially  wide  range 
of  ability  0  for  which  the  test  information  is  constant,  in  order 
to  make  the  estimation  of  the  operating  characteristics  of  the  other 
items  meaningful.  These  two  are  opposing  factors,  as  is  obvious 
from  (8.20)  and  (8.21)  .  The  solution  for  this  problem  is  to  use 
a  substantially  large  number  of  equivalent  binary  items,  whose  common 
discrimination  parameter  is  low,  as  was  mentioned  in  the  preceding  section. 


Another  problem  is  the  effect  of  the  range  of  0  on  the  speed 
of  convergence  of  the  conditional  distribution  of  §  ,  given  0  , 
to  the  normal  distribution,  n(6,  (n"’*'^C-'S)  .  Since  the  range  of  0 
is  a  finite  interval  which  is  given  by  (8.31)  ,  it  should  be  expected 
that  the  truncation  of  the  conditional  distribution  makes  the 
convergence  slow  around  the  values  of  0  close  to  (-ira  ^/A)+b  and 
(ira  /4)+b  ,  as  is  illustrated  in  Figure  .8-8-1.  A  solution  for  this 

o  o 

problem  is  again  to  use  a  set  of  equivalent  binary  items  whose  common 

discrimination  parameter  is  low,  so  that  the  range  of  6  is  wide 

enough  to  Include  all  the  examinees  far  inside  of  the  two  endpoints  of 

the  interval  of  0  .  An  alternative  for  the  above  solution  is  to 

c  .  -  ,  -1 . . »  . 

or 


exclude  examinees  whose  0  ’s  are  close  to  (-ira  /4)+b 

8  8 

In  the  second  colution,  however,  the  number  of  examinees 


(ira""^/4)+b 
8  8 


-166- 


VXI1-15 


will  be  decreased  and  this  may  affect  the  accuracy  of  the  estimation 
of  the  operating  characteristics.  It  is  worth  noting  that  the  solution 
for  the  first  problem,  is  also  the  solution  for  the  second  problem. 

If  there  exist  more  than  one  subset  of  equivalent  binary  items 
within  the  tentative  item  pool,  we  can  make  a  full  use  of  all  the 
subsets.  We  follow  the  process  described  earlier  for  each  subset  of 
equivalent  binary  items,  and  the  resultant  estimated  operating 
characteristics  can  be  equated  by  appropriate  transformations  of  the 
separately  defined  latent  traits,  using,  say,  the  least  squares 
principle,  to  integrate  all  of  them  into  one  scale. 


Graphical  Illustration  of  tha  Conditional  Danaity  functioaa 

of  tha  Hasiau*  likelihood  Eatlaata  6  ,  Givan  tha  Utant 
trait  •  . 


In  order  to  pursue  the  process  of  convergence  of  the  conditional 
distribution  of  the  maximum  likelihood  estimate,  givan  ability,  to 
the  asymptotic  normality  when  a  test  consists  of  n  equivalent, 
binary  test  items,  a  Monte  Carlo  study  was  conducted  (cf.  RR-79-3) . 


ife  ■  jj&L  -1*  >  ■ 


-167- 


Vlll-16 


) 


i 


? 

i 


i 


For  the  common  Item  characteristic  function  of  the  hypothetical 
equivalent,  binary  items,  Constant  Information  Model  with  the 
parameters, 

(V  -  0.25 

(8.32)  {  8 

Lbg  -  o-oo  . 


was  used.  The  interval  of  6  for  which  the  item  information 
function  assumes  a  positive  constant  is  given  by 


(8,33)  -n  <  6  <  ir  , 


and  we  have  for  the  amount  of  item  Information 


(8.34)  I  (0)  -  0.25  . 

As  the  fixed  levels  of  the  latent  trait  6  ,  eight  positions 
were  selected,  l.e.,  -3.0  ,  -2.2  ,  -1.4  ,  -0.6  ,  0.2  ,  1.0  , 

1.8  and  2.6  .  A  group  of  one  hundred  hypothetical  examinees  were 
assigned  to  each  of  the  eight  levels  of  ability  0  ,  to  make  the 
total  number  of  hypothetical  examinees  eight  hundred.  There  were 
twenty  hypothetical  sessions  of  testing,  and  in  each  session  ten 
equivalent,  binary  items  were  administered.  An  item  score 

(»  0  or  1)  was  calibrated  by  the  Monte  Carlo  method  following 
the  Constant  Information  Model.  After  the  completion  of  each  session, 
the  cumulative  test  score  t  was  computed  for  each  of  the  eight 
hundred  hypothetical  examinees.  Thus  after  the  completion  of  the 
k-th  session  the  full  test  score  is  10  x  k  .  The  maximum  likelihood 
estimate  §  was  obtained  by 

(8.35)  0  *  F^U/ClOk)] 

* 

«  4  sln“1{[t/(10k)]1/2}  -  rr 

for  each  hypothetical  subject,  after  the  completion  of  the  k-th 
session.  As  an  ajcample  of  slow  convergence,  Figure  8-8-2  illustrates 


A'wnl  nahvirf*.  ViiDW;  ■« 


ML.-.i 


Ot  ‘X 


?  | 


a 

3  I 


3  3  3  3  3 

ouw  A3ManUa»u  3Aii\nn«n3 


T  i 


3  I 

tt 

3  I 


9  3  3  3  3  3 

0I1W  ADM3W) 3Yi  JAUVinwnO 


a 

3  1 


3  I 


O.  ft 
4)  >  • 

ft  ft  H 

V»  3  ft 
WO  4j 

« a 

4>  *4  3  * 

ft  r<  H  ►* 
d  o  4  H 
•S  W  41 

ft  w  > 

S  P  «H 

H  W  ■  & 

8  0  ft  • 

§H5 

ss^a  . 

5  8  o 

o  »h  <a  is 

li-| 

8  *3  a  2 

a  *j  <■* 
ft  OB 

*  rj  f“* 

*  !l  * 

W  ft  Q  r» 

°  O  ^ 

SB  "O 

*•  4i  3  a 

s  s  iL* 

||WH. 

E  ST'S  o 

a5  s--1 

o  • 
•o  £  « 

3WJ  g 

55*1 

*ft  »  -»  ft 

5  *  S'  3 

t-J  3  w 

i°.“  - 

175  5 

•H  ft  ft 
H  ft 

ft  *42  H 

fm 

43  ,0  O 
ft  *M  C  . 

•ft  0  N 
«*  a  ft  ft 
OP  w  w 
O  ft 
o  ft  g  < 
ft  M  9 
U  <  U*  * 

8? 
*  ft  ft  W 

ft  >  ft  9 

§*r 

ft  Q 
ft  4* 

b:i 

111 

§** 


a  9 


1  a  3 


3  3  3 


FIGURE  8-8-2  (Continued) 


-170- 


VII1-19 


the  resultant  cumulative  frequency  ratios  of  the  maximum  likelihood 

estimates  of  the  one  hundred  hypothetical  examinees  of  group  1  by 

step  functions,  along  with  the  normal  distribution  functions, 

-1/2 

N(e,{I(6)>  )  ,  which  are  drawn  by  solid  curves,  after  the 

completions  of  Sessions  9,  10,  11,  12,  17,  18,  19  and  20  , 
respectively.  In  the  same  figure,  also  presented  are  the 
corresponding  normal  distribution  functions  with  the  sample  mean 
and  standard  deviation  of  §  as  the  two  parameters,  by  dotted  curves. 
We  can  see  in  this  figure  that  the  two  normal  distribution  functions 
are  still  distinctly  apart,  even  after  all  the  twenty  sessions. 

Figure  8-8-3  presents  the  corresponding  set  of  results  for 

Group  5  ,  as  an  example  of  fast  convergence.  V»e  can  see  in  this 
figure  that  the  approximation  is  good  enough  even  after  Session  9  . 

For  the  details  of  this  study,  the  reader  is  directed  to  the  research 
report,  RR-79-3  . 


REFERENCES 


[1]  Birmbaum,  A.  Some  latent  trait  models  and  their  use  in 

infering  an  examinee's  ability.  In  F.  M.  Lord  & 

M.  R.  Novick.  (Eds.)  ,  Statistical  theories  of  mental 
test  scores.  Reading,  Mass.;  Addison-Wealey ,  1968, 

[2]  Samejima,  F.  A  comment  on  Bimbaum's  three-parameter  logistic 

model  in  the  latent  theory.  Psychometrika.  1973,  38. 
221-233.  - 


-173- 


IX- 1 


i 

I 

] 

1 

I 

I 

] 

1 

] 

1 

] 

I 

I 

1 

1 


IX  A  New  Family  of  Models  for  the  Multiple-Choice  Test  Item:  I 

In  this  chapter,  we  shall  start  summarizing  the  rationale  and 
findings  of  the  part  of  the  research,  a  new  family  of  models  for  the 
multiple-choice  test  items,  which  relates  to  one  of  the  main  objectives 
of  the  present  study.  In  so  doing,  we  shall  introduce  the  study  which 
the  author  conducted  in  Tokyo,  Japan,  with  the  collaboration  of 
Japanese  researchers,  including  Dr.  Sukeyori  Shiba  and  his  group  of 
eductional  psychologists  and  Dr.  Takahiro  Sato  and  his  group  of 
educational  engineers.  For  simplicity,  in  this  and  next  chapters, 
the  research  will  be  referred  to  as  Tokyo  Research. 

(IX . 1 )  Mathematical  Models  and  Psychological  Reality 

Psychometricians  pursue  methodologies  to  the  extent  that  some 
specific,  narrowly  focused  topics  may  become  their  life  works.  This 
phenomenon  is  well  exemplified  in  the  large  number  of  papers  published 
in  Psychometrika,  which  are  focused  upon  various  specific  topics  of 
factor  analysis.  Although  it  has  its  own  merits,  if  we  are  soley 
satisfied  with  this  type  of  research,  we  may  overlook  a  more  important 
aspect  of  research,  i.e.,  psychological  reality.  Consequently,  our 
work  may  not  contribute  to  the  progress  of  science  to  a  great  extent. 

Mathematical  models  have  played  an  impoitant  role  in  psychology 
as  an  science.  The  validation  of  mathematical  models  with  psychological 
reality  haB  attracted  less  attention  from  researchers,  however. 

Needless  to  say,  a  mathematical  model  is  nothing  unless  it  has  a  sound 
rationale  to  represent  our  psychological  reality,  and,  consequently,  we 
shall  be  able  to  design  and  organize  our  research  to  obtain,  without 
distortions,  meaningful  findings  and  future  directions.  Researchers' 
conscience  preassumes  the  virtue  of  doubts.  We  cannot  emphasize  enough 
that  the  soundness  of  the  rationale  behind  any  mathematical  model  and 
its  fitness  to  our  psychological  reality  are  by  far  the  most  important 
to  our  research.  For  this  reason,  the  author  has  developed  various 
methods  and  approaches  for  estimating  the  operating  characteristics  of 
discrete  item  responses  without  assuming  any  mathematical  forms  (cf. 
Chapters  3,  5  and  6).  When  we  are  not  certain,  we  may  approach  the 


174- 


IX-2 

subject  wlthour  assuming  any  mathematical  models. 

(IX. 2)  Three  Parameter  Logistic  Model 

Three-parameter  logistic  model  (Birnbaum,  1968)  has  been  widely 

used  £or  the  multiple-choice  test  item  among  psychometricians  and  other 

researchers  in  mental  measurement.  The  model  is  based  upon  the  knowledge 

or  random  guessing  principle,  i.e.,  the  examinee  either  knows  the  answer, 

or  guesses  randomly  ard  picks  up  an  arbitrary  alternative.  Let  ¥  (e) 

S 

be  the  item  characteristic  function  in  the  logistic  model,  which  is 
given  by  (8.12).  The  three-parameter  logistic  model  is  defined  by  the 
item  characteristic  function  such  that 

(9.1)  Pg<0  “  d-cg)  *g<e)  +  cg  » 

where  c  is  the  third  parameter,  which  is  called  the  guessing  parameter. 

O 

In  spite  of  the  popularity  of  the  model,  very  few  researchers  have  tried 
to  validate,  or  invalidate,  the  model  with  their  own  data. 

It  is  common  among  experienced  test  constructors  to  include  wrong, 
but  plausible,  answers  among  the  alternatives  of  a  multiple-choice  item, 
which  are  called  distractors,  so  as  not  to  make  its  correct  answer  too 
conspicuous  and  destroy  the  quality  of  the  question.  It  is  noted  that 
we  need  some  higher  mental  processes  other  than  random  guessing  to 
recognize  the  plausibility  of  a  dlstractor,  and  to  be  attracted  to  it. 

It  Is  contradictory,  therefore,  to  apply  the  three-parameter  normal 
ogive,  or  logistic,  model  for  multiple-choice  items  with  such  distractors, 
although  many  researchers  seem  to  like  the  model. 

The  third  parameter  of  the  three-parameter  logistic  model,  c  , 

8 

is  often  called  pseudo-guessing  parameter,  and  its  estimate  tends  to 
be  less  than  unity  divided  by  the  number  of  the  alternatives  (e.g. 

Lord,  1968).  This  fact  itself  is  the  invalidation  of  the  model,  although 
many  researchers  do  not  admit  it.  It  is  apparent  that  something  other 
than  random  guessing  is  included  in  our  psychological  reality,  which 
makes  ub  choose  wrong  answers  in  preference  to  the  correct  answer. 


- .  laiir, y 


."aShau 


— j" 

SOM 


mwii  aawiaiiaaaaaar 


-175- 


IX-3 


Some,  other  model,  or  models,  Is  desirable  which  fits  our  psychological 
reality  better. 

(IX. 3)  Tokyo  Research 

In  the  summer  of  1979,  the  author  spent  a  few  weeks  in  Tokyo, 

Japan,  with  the  support  of  the  Office  of  "aval  Research,  and  had 
conferences  with  researchers  In  Japan.  The  scientific  monograph 
published  in  1980  (cf.  Chapter  2),  with  the  help  of  Dr.  Rudolph  J. 

Marcus,  Scientific  Director  of  the  ONR  Tokyo  Office,  is  based  upon  this 
research.  The  researchers  with  whom  the  author  had  conference  include 
Dr.  Takahiro  Sato  and  Dr.  Sukeyori  Shiba.  The  author  had  two  more 
opportunities  to  have  conferences  with  them  In  the  summers  of  1980  and 

1981  .  Among  others.  Dr.  Shiba  and  the  author  started  a  long  term 

♦ 

collaboration  in  ‘research  in  1979  >  which  concerns  with  his  word 
comprehension  tests,  and  mathematical  models  for  the  multiple-choice 
test  items.  It  will  eventually  incorporate  the  author's  methods  and 
approaches  for  estimating  the  operating  characteristics  of  the  discrete 
item  responses  In  a  large  scale  of  empirical  Btudy, 

In  Section  IX, 4  ,  a  brief  introduction  to  Sato’s  research  on 
Index  k  will  be  made,  Shiba, ’s  research  and  hia  word  comprehension 
teats  will  be  Introduced  in  Sections  X.l  through  X.3  of  the  next  chapter, 

(XX , 4 )  gate’s  Index  k 

bet  g  (rl,2, , , . ,nl  be  a  multiple-choice  test  item,  In  this 
section,  however,  this  symbol  g  is  omitted,  whenever  it  is  clear 
that  we  deal  with  only  one  item,  Let  1  (*1,2, . , , ,ra)  be  an  alternative, 

or  an  option,  of  the  multiple-choice  item  g  ,  and  be  the  probability 
with  which  the  examinee  selects  the  alternative  i  .  The  entropy  H  is 
defined  as  the  expectation  of  rlogj?^  such  that 

m 

(9,2)  H  -  -  E  p,  log-p.  , 

i-1  1  Z  1 

for  the  set  of  m  alternatives  of  item  g  ,  It  is  obvious  from  (9,2) 


-176- 


IX-4 


that  the  entropy  H  la  non-negative,  end,  if  one  of  the  a  alternatives 
is  the  sure  event  with  unity  as  its  probability,  then  H  -  0  .  Sato's 
Index  k  is  defined  by 

(9.3)  k  -  2H  , 

and  is  used  as  an  index  of  the  effectiveness  of  the  set  of  a  alternatives 
for  item  g  in  the  context  of  information  theory.  Since  the  entropy  H 
Indicates  the  expected  uncertainty  of  the  set  of  a  events,  or  alternatives, 
the  set  of  alternatives  is  more  informative  for  a  greater  value  of  k  . 

When  the  probability  pa  is  replaced  by  the  frequency  ratio,  , 
we  can  write  for  the  estiaate  of  the  entropy  such  that 

(9.4)  fi  -  -  “  P  log-P  , 

i-l  1  1  1 

and  for  the  estiaate  of  k  we  have 

A 

(9.5)  k  -  211  . 


We  notice  that  we  can  obtain  the  number  of  hypothetical,  equivalent 
alternatives  k  without  using  the  entropy,  for  we  have 


(9.6) 


k  -  -  2 


m 

-  £  P 
i-l 


log 


2Pi 


m  -p. 


-  ,nP 

i-l 


■  pi  -l 

t  II  Pi  *1  * 

i-l  1 


The  quantity  in  the  brackets  of  the  last  expression  of  (9.6)  is  a  kind 
of  weighted  guometric  mean  of  ^  .  Equation  (9.6)  also  implies  that 

we  can  use  any  base  for  log  ,  instead  of  2  .  For  convenience, 

hereafter  we  shall  uaa  e  as  the  base  of  log  p^  ,  and  use  H*  instead 
of  H  such  that 

m 

(9.7)  H*  -  -  l  p  log  p  >  0  , 

i-l  x 


-177- 


IX- 5 


which  equals  zero  when  one  of  the  alternatives  Is  the  eure  event,  and 

(9.8)  k  -  e**  >  1  , 

and  simply  write  log  p^  instead  of  . 

To  find  out  the  value  of  p^  which  maximizes  H*  ,  and  hence 
k  ,  we  define  Q  such  that 

m  m 

(9.9)  Q  “  -  J  Pi  log  p.  +  X[  2  p.-l)  , 

1-1  1  1  i-1  1 

where  X  is  Lagrange's  multiplier.  Thus  the  partial  derivative  of 
Q  with  respect  to  p^  is  given  by 

(9.10)  “tp"  “  Pjl  +  (l/p^P^l  +  X  -  -log  p£  +  (X  -  1) 

Setting  this  derivative  equal  to  zero,  we  obtain 

(9.11)  log  p^  -  X  -  1  , 

which  is  a  constant  regardless  of  the  value  of  1  .  Since  we  have 
m 

(9.12)  Ip.-l  , 
i-1  1 

we  obtain 

(9.13)  p^  -  l/o  . 

Thus  it  is  clear  that  H*  ,  end  hence  k  ,  is  maximal  when  all  the 
m  alternatives  are  equally  probable,  and  we  can  write 

(9.14)  max.(H*)  -  log  m 


and 


-178-  Ix“6 

(9.15)  aax.(k)  ■  m  . 

Since  In  the  present  situation  the  m  events  are  alternatives, 
the  values  of  H*  aid  k  are  affected  by  the  difficulty  level  of 
item  g  .  Let  R  be  the  correct  answer  to  item  g  ,  which  is  given 
as  one  of  its  alternatives,  and  pR  be  the  probability  with  which 
the  examinee  selects  the  correct  answer  R  .  Figure  9-4-1  presents 
the  relationship  between  the  probability  pR  and  the  number  of 
hypothetical,  equivalent  alternatives  k  .  In  this  figure,  the  area 
marked  by  slanted  lines  indicates  the  set  of  k  ’  s  which  are  less 
than  max,(kjpR)  aid  greater  than  max.[l/pR,  min,(k|pR)]  ,  and  are 
considered  to  be  reasonable  values  of  k  by  Sato  and  others.  In 
practice.  Figure  9-4-1  is  used  by  replacing  the  probability  pR  by 


PROBABILITY  POR  CORRECT  ANSWER 
FIGURE  9-4-1 

telationahip  between  the  Probability  with  Which  tha  Correct  Answer  X  Xa  Selected 
and  the  Huaber  of  Hypothetical,  Equivalent  Alternatives,  for  71ve-Choic»  it sea. 

(Sato's  Data) 


-179- 


IX- 7 


the  proportion  correct,  ,  and  the  number  of  hypothetical, 
equivalent  alternatives,  k  ,  by  its  estimate  &  . 

(IX. 5)  Index  k*  for  the  Validation  Study  of  the  Three-Parameter 
Logistic  Model 

Sato’s  Index  k  takes  on  a  high  value,  if  every  examinee  in 
the  group  has  selected  one  of  the  m  alternatives  at  random.  This 
fact  implies  that,  although  the  index  was  introduced  for  quite  an 
opposite  purpose  and  proved  its  usefulness,  it  may  also  be  useful  in 
detecting  the  examinee' s  random  guessing  behavior  in  quite  a  different 
situation,  i.e.,  the  multiple-choice  testing.  In  so  doing,  it  will 
be  more  convenient  if  we  can  modify  Sato's  Index  k  in  such  a  way 
that  it  is  unaffected  by  the  ability  distribution  of  a  specific 
population  of  examinees,  and  can  be  considered  as  a  pure  property 
of  the  item.  With  this  aim  in  mind,  we  shall  introduce  a  new  index, 
i.e..  Index  k*  . 

Let  A  be  the  event  that  the  examinee  does  not  know  the 
answer  to  item  g  ,  and  consider  the  probability  space  which  consists 
of  such  a  subpopulation  of  examinees.  The  conditional  probability, 
p(i|A)  ,  with  which  the  examinee  selects  the  alternative  i  of  item 
g  in  this  conditional  probability  space  is  given  by 


f*  vll pi  +  Pr1"1 

iVR 

(9.16) 

p(i|a) \ 

Pr'J/1*  Pr1"1  ' 

1-R 

where  p*  denotes  the  probability  with  which  the  examinee  guesses 
correctly  for  item  g  .  The  new  index,  k*  ,  is  defined  in  terms  of 
these  conditional  probabilities,  in  such  a  way  that 

(9.17)  k*  *  exp[-  Z  p(i  |A)  •  log  p(i|A)]  -  [  II  p(ijA)P^i^^]  1  , 

i“l  i*l 

It  is  obvious  that  p(i|A)  for  i^R  is  proportional  to  p^  ,  for 
every  examinee  in  the  population  who  has  selected  one  of  the  wrong 


-180- 


1X-8 


answers  does  not  know  the  answer,  and  consequently,  he  la  also  in  the 
subpopulation  A  .  On  the  other  hand,  examinees  who  have  selected 
the  correct  answer  R  are  not  necessarily  in  the  euspopulation  A  , 
so  we  can  write 


(9.18)  p*  «  p 


R 


Note  that,  if  the  examinee’s  behavior  follows  the  knowledge  or 
random  guessing  principle  and  the  item  characteristic  function  of  the 
multiple-choice  item  g  is  of  one  of  the  three-parameter  models,  pj* 
equals  p.^  for  i^R  ,  and,  as  the  result,  all  the  m  p(ijA)  *s 
are  equal  and  k*  -  m  . 

In  practice,  we  need  to  use  some  estimates  for  p(i|A)  ' s  ,  to 
obtain  the  estimate  of  k*  .  Since  we  have  the  frequency  ratio,  P^  , 
for  the  estimate  of  p.^  for  i^R  ,  all  we  need  to  do  is  to  find  out 
an  appropriate  estimate  of  p*  .  Let  P*  denote  such  an  estimate 
of  p*  ,  and  P*  be  suc.h  that 


(9.19) 


Pl 

L«P* 
V  R 


i^R 

i-R 


Then  we  can  write  for  the  estimate  of  p(ijA)  such  that 


m 


-1 


(9.20)  p(i|A)  -  P*[  Z  P*3 

1  i-1  1 

We  are  to  take  the  strategy  of  finding  P*  which  makes  k* 
maximal.  Define  ft*  such  that 


m 


(9.21) 


AM 

H*  -  log  £*  =  -  E  $(i jA) *1or  £(i|A) 
i-1 


-  -[  £  P*3_1[  I  P*‘log  P*  -  (  E  P*) *log  {  I  P*}) 
s-1  S  i-1  1  1  i-1  1  s-1  8 

Then  the  partial  derivative  of  ft*  with  respect  to  P*  can  be 


1 


-181- 


IX- 9 


written  as 


(9.22) 


,  ta  m 

[  2  ?*3'Z[  2  P4*log  P*  -  (  2  P*)*log  P$ 3  , 
diR  g-1  *  1-1  1  1  s-l  *  R 


and,  setting  this  equal  to  zero,  we  obtain 


(9.23)  log  P*  -  [  2  P  I”1  2  P.*log  P  , 

R  aft  8  aft  1  1 

and  then 


i 

i 

I 


I 


(9.24)  p*  -  n  p,  1  «Jr  8 

R  if*R  1 

Thus  we  can  use  (9,24)  In  (9,19),  and,  therefore,  obtain  p( 1 } A) 
through  (9.20).  The  estimate  of  the  new  index,  k*  ,  la  given  by 

(9.25)  ft*  -  exp(-  2  $<l|A)‘log  $<i|X)3  -  [  H  ^(i|X)^(i^  3"1  . 

1-1  1-1 


A  necessary,  though  not  sufficient,  condition  for  one  of  the 
three-puraaeter  models  to  be  valid  is  that  k*  should  be  equal  to 
m  within  sampling  fluctuations,  regardless  of  the  population  of 
examinees  from  which  our  sample  happened  to  be  selected.  If  this  is 
not  the  case,  we  avut  say  that  the  three-parameter  model  does  not 
fit  our  item(  i.a. ,  the  invalidation  of  the  model. 

(IX. 6)  Simulation  Study  on  Index  k* 

For  the  purpose  of  illustration,  a  set  of  simulated  data  was 
calibrated,  using  the  Monte  Carlo  method.  In  this  set  of  data,  five 
hypothetical  multiple-choice  test  items  were  assumed,  each  having 
five  alternatives,  A,  B,  C,  D  and  E,  with  A  always  as  the  correct 
answer.  Each  item  is  assumed  to  follow  the  three-parameter 
normal  ogive  model,  and  its  parameter  values  are  shown  in 
Table  9-6-1.  A  group  of  five  hundred  hypothetical  examinees  was 


-182- 


IX-10 


TABLE  9-6-1 

Xt«s  Dlacr latent ioa  Par«MC«r  md 

It«*  Difficulty  FiriMUr  b^  of  Each 

of  the  Tlva  Hypothetical ,  Binary  Itaaa 
Following  tha  nucaa-Paraaatar  horoal 
Oglva  Modal,  with  c  •  0.2  • 


XtMt 

* 

i 

b 

• 

1 

1.00 

0.00 

2 

1.30 

O.'t 

3 

2.00 

0.00 

/, 

2  .30 

0.00 

5 

3.30 

0.00 

assumed,  whose  ability  levels  are  placed  at  one  hundred  equally  spaced 
points  on  the  ability  continuum,  which  start  with  -2.475  and  end 
with  2.475  ,  in  such  a  way  that  subjects  1  through  5  are  placed 
at  6  •»  -2.475  ,  subjects  6  through  10  are  at  6  •  -2.425  ,  and 
so  on.  For  each  of  the  five  hypothetical  multiple-choice  ltens,  the 
response  of  each  of  the  five  hundred  hypothetical  examinees  was 
calibrated  according  to  the  specified  item  characteristic  function 
with  the  knowledge  or  random  guessing  principle. 

Table  9-6-2  presents  the  frequency  ratio,  ,  of  each  of 
the  five  alternatives,  for  each  of  the  five  hypothetical  multiple- choice 
items.  We  can  see  that  sampling  fluctuations  are  fairly  large  for 
item  4,  and  to  a  less  degree  for  item  2,  since  the  corresponding 
probability,  ,  is  0.6  for  the  alternative  A  and  0.1  for  each 
of  the  alternatives  B,  C,  D  and  E  .  In  the  same  table,  also 
presented  are  the  values  of  P*  ,  which  were  obtained  through  (9.24). 
Using  these  values  in  (9.21),  (9.24)  and  (9.25),  the  estimates  of  the 
entropy  H*  and  the  Index  k*  were  obtained,  and  are  presented  in 
Table  9-6-3.  Since  the  maximal  possible  value  of  ft*  is  approximately 
1.60944  ("log  o)  and  that  of  £*  is  5  (**m)  ,  we  can  say  that  these 
results  are  sufficiently  close  to  their  respective  maximal  values,  i.e., 


t 

< 


-183- 


TABLE  9-6-2 

Frequency  Ratio  of  the  Subject,  ,  Who  Selected 

Each  of  the  Five  Alternatives,  and  the  Modified 
Frequency  Ratio  P*  for  the  Correct  Answer  A, 

for  Each  of  the  Five  Hypothetical  Itemr. 


Alternative 


Item 


r  A 


E 


PR 


PR 


PS 


n 


.608 

.098 

.618 

.096 

.600 

.100 

.606 

.101 

.598 

.101 


.086 

.102 

.094 

.104 

.092 


.106 


.080 


.100 


.106 


.106  .108 


.078 

.100 


.130 


.104 


.100 

.094 

.092 

.082 

.106 


TABLE  9-6-3 


Cntropy,  H*,  and  the  Number  of  Hypothetical, 
Equivalent  Alternatives,  6*  ,  for  Each  of 
the  Five  Hypothetical  Items  Following  the 
Three-Parameter  Normal  Ogive  Model. 


Item 

ft* 

ft* 

1 

1.60714 

4.98853 

2 

1.60501 

4.97789 

3 

1.60744 

4.99000 

4 

1.59224 

4.91475 

5 

1.60829 

4.99424 

4itk 


an  exemplification  of  the  satisfaction  of  one  of  the  necessary 
conditions  for  validating  the  three-parameter  normal  ogive  model  and 
the  knowledge  or  random  guessing  principle  by  our  simulated  data. 

The  fact  that  these  results  are  less  satisfactory  for  item  4  and  that 
the  same  is  true,  to  a  lesser  degree,  for  item  2  must  be  due  to  the 
sampling  fluctuations,  which  were  observed  in  Table  9-6-2. 

For  the  detail  of  this  study,  the  reader  is  directed  to 
ONR-Tokyo  Scientific  Monograph  3,  Chapter  5. 

(IX, 7)  Iowa  Tests  of  Basic  Skills 

Concerning  the  validation  of  mathematical  models,  an  empirical 
study  was  conducted  using  test  data  provided  by  Dr.  William 
Coffman  of  the  University  of  Iowa,  who  is  also  Director  of 
the  Iowa  Testing  Programs.  For  simplicity,  hereafter,  we  shall  call 
them  Iowa  Data,  and  this  part  of  research  Iowa  Study.  The  data 
analysis  of  this  part  of  the  research  was  conducted  by  the  persistent 
effort  of  one  of  the  author's  assistants,  Robert  Trestman. 

The  battery  of  tests  used  here  is  the  Icwa  Tests  of  Basic  Skills 
Form  £,  Levels  9-14.  These  tests  have  been  designed,  constructed,  and 
revised  at  the  College  of  Education  of  the  University  of  Iowa  since 
1935,  with  the  general  school  population  in  mind,  and  for  students  of 
ages  nine  through  fourteen,  or  grades  three  through  nine.  All 
technical  information  in  this  paper  has  been  taken  from  either  Form  6 
itself  (Hieronymous  and  Lindquist,  1971),  or  its  Teacher's  Manual 
(Iowa  Basic  Skills  Testing  Program,  1971) . 

Ther^  are  eleven  tests  in  the  battery,  each  of  which  focuses  on 
a  different  basic  skill.  For  convenience,  hereafter,  we  shall  call 
these  separate  test'  subtests,  in  order  to  avoid  the  confusion  which 
might  occur  when  we  refer  to  both  the  total  test  battery  and  each 
test  in  the  battery.  Following  the  Teacher's  Manual,  the  descriptions 
and  abbreviations  of  these  eleven  subtests,  together  with  their 
administration  schedule  and  working  times,  are  tabulated  and  presented 
in  Table  9-7-1.  All  the  teBt  items  are  power  test  items  with 
multiple-choice  format,  with  five  alternative  answers  for  the  items  in 


iiil  ■ 


jilt 


-185- 


IX- 13 


TABLE  9-7-1 


Administration  Sessions,  Tine  Limits  and  Subtests  of  Iowa  Tests 
of  Basic  Skills. 


Administration 

Session 

Working 

Tine 

(Minutes) 

Sub test 

First  Session 

17 

V! 

Vocabulary 

85  Minutes 

55 

R: 

Reading  Comprehension 

12 

L-l: 

Spelling 

Second  Session 

15 

L-2: 

Capitalisation 

80  Minutes 

20 

L-3: 

Punctuation 

20 

L-4i 

Usage 

30 

W-l: 

Map  Reading 

Third  Session 

85  Minutes 

20 

W-2: 

Reading  Graphs  and 
Tables 

30 

W-3: 

Knowledge  and  Use  of 
Reference  Materials 

30 

M-l; 

Mathematics  Concepts 

85  Minutes 

30 

_  _ _ 

M-2: 

Mathematics  Problem 
Solving 

Subtest  LI,  and  with  four  alternatives  for  those  in  the  other  ten 
subtests.  These  eleven  subtests  are  designed  to  cover  all  major 
areas  of  academic  interest  for  the  grades  three  through  nine. 

The  numbers  of  test  items  contained  by  the  eleven  separate 
subtests  are  114,  178,  114,  102,  102,  86,  89,  74,  141,  136  and  96, 
respectively,  following  the  order  of  subtests  given  in  Table  9-7-1. 
For  each  of  the  fiv~.  levels,  9  through  14,  only  a  subset  of  each 
subtest  is  administered.  The  standardized  administration  schedule 
and  the  working  time  for  each  subtest  are  presented  in  Table  9-7-1. 
For  the  entire  test  battery,  the  time  required  for  the  administration 
of  each  level  of  test  is  four  hours  and  thirty-nine  minutes.  It  is 
recommended  that  the  test  be  administered  on  four  consecutive  days. 

In  our  data,  only  the  tests  of  Levels  11,  12  and  13  were  used. 
The  numbers  of  test  items  contained  in  these  three  levels  of  test  are 
461,  487  and  500,  respectively.  A  graphical  representation  is  made 
in  Figure  9-7-1,  to  show  how  these  three  subsets  of  test  items  in 
each  subtest  overlap  among  the  three  levels. 


0  20  40  60  80  100  120  140  160 

ITEM  NUMBER 


FIGURE  9-7-1 

T««t  It  Ml  Of  Each  of  tha  Slav  an  Subtaata  of  Iowa  Taats  of  Baalc 
Skills  Ad»ini#tatad  to  Each  of  Lavala  U,  12  and  13  ,  Which  Ara 
Rtpraaantad  by  Shadad,  Bollov,  and  Solid  Bars,  Haapactivaly . 

We  notice  in  Figure  9-7-1  that  all  the  test  items  given  to 
the  students  of  Level  12  are  also  given  to  those  of  Level  31  or  Level 
13,  or  both.  There  are  exactly  one  hundred  test  items  which  are 
given  to  all  the  three  levels  of  examinees.  There  are  264  which  are 
given  to  Levels  11  and  12,  and  323  to  Levels  12  and  13,  respectively. 
We  also  have  197  items  which  are  taken  by  the  examinees  of  Level  11 
only,  and  177  by  those  of  Level  13  only.  Thus  the  total  number  of 
test  iteau;  is  1,061. 

(IX. 8)  Original  and  Revised  Iowa  Data 

Data  were  collected  in  three  different  school  systems  in  the 
State  of  Iowa,  in  the  years  1971  through  1977.  In  their  original 
form,  the  total  number  of  examinees,  including  both  boys  and  girl6, 
is  7,581.  Out  of  these  people,  28  students  took  Level  9  Test  aud 
114  took  Level  10  Test.  Since  these  are  relatively  small  numbers,  we 
decided  to  exclude  them  from  our  original  group  of  examinees.  The 


-187- 


IX-15 


other  7,439  examinees  are  classified  Into  three  subgroups,  i.e., 

2,460  students  who  took  Level  11  Test,  2,452  who  took  Level  12  Test, 
and  2,527  who  took  Level  13  T^est.  Hereafter,  we  shall  call 
observations  concerning  these  7,439  examinees  the  original  data. 

It  was  found  out  that  there  are  a  relatively  small  number  of 
examinees  who  did  not  respond  to  a  substantially  large  number  of 
test  items.  While  as  many  as  7,010  examinees  out  of  the  total  7,439 
examinees  left  only  49  or  less  test  items  unanswered,  there  also  are 
162  examinees  who  did  not  respond  to  as  many  as  100,  or  more,  test 
items.  Our  raw  data  show  there'  are  some  examinees  included  who 
skipped  an  entire  subtest,  or  more  than  one  entire  subtest.  A.  close 
examination  of  the  original  data  indicates  that,  if  we  exclude  all 
the  examinees  who  left,  at  least,  one  half  of  a  subtest  unanswered 
from  our  total  group  of  examinees,  tnen  the  number  of  examinees  who 
left  200  or  more  test  items  unanswered  will  become  zero,  and  only 
28  examinees,  who  omitted  more  than  100,  but  less  than  200,  test  items, 
will  be  included.  For  this  reason,  we  have  decided  to  exclude  the 
193  examinees  who  left. one  half  of  a  subtest,  or  more,  unanswered 
from  our  original  group  of  examinees  for  the  detailed  analysis. 
Hereafter,  we  shall  call  observations  concerning  the  remaining  7,246 
examinees  the  revised  data,  to  distinguish  them  from  the  original  data. 

Table  9-8-1  presents  the  item  identifications  of  the  fifty-five 
test  items,  i.e.,  34  for  Level  11,  15  for  Level  12,  and  6  for  Level  13, 
to  which  less  than  90  percent  of  examinees  responded  in  the  original 
data,  their  percentages  in  the  original  and  revised  data,  respectively. 
We  can  6ee  in  this  table  that  for  n>.o»t  of  these  fifty-five  test  items 
the  two  percentages  show  a  visible  improvement  caused  by  the  exclusion 
of  the  193  examinees.  There  is  n  substantial  improvement  in  the 
percentage  of  examinees  who  answered  in  one  way  or  another,  for  all 
the  three  levels,  which  was  provided  by  the  exclusion  of  the  193 
examinees.  Among  others,  we  notice  that  the  frequency  of  test  items 
which  were  answered  by  99  percent,  or  more,  of  examinees  increased 
from  231  to  320  for  Level  11,  from  319  to  350  for  Level  12,  and  from 
286  to  377  for  Level  13. 


-189- 


IX-17 


Table  9-8-2  presents  the  frequency  distribution  of  test  items 
for  each  of  the  eleven  subtests  with  respect  to  the  percentage  of 
examinees  who  answered  correctly,  for  each  of  Levels  11,  12  and  13, 
for  the  revised  data.  It  should  be  noted  that,  even  in  the  revised 
data,  these  percentages  correct  are  not  independent  from  the  positions 
of  the  test  items  in  each  subtest.  There  is  a  distinct  tendency  that 
larger  numbers  of  examinees  did  not  respond  to  items  which  were 
presented  later  in  each  subtest.  It  is  obvious,  therefore,  that, 
for  these  later  items,  the  percentage  for  the  correct  answer  is  less 
than  it  should  be  in  the  ideally  set  free-response  situation. 


(IX. 9)  Informative  Distractor  Model 

By  Informative  Distractor  Model,  we  mean  the  family  of  models 
in  which  we  assume  the  existence  of  specific  information  obtainable 
from  separate  alternative  answers,  including  the  correct  answer,  of 
each  multiple-choice  test  item. 


TABLE  9-8-2 


frequency  Distribution  of  Iteme  for  Each  of  the  Elsvsn  Subtssts  with  Reepect  to  the 
Percentage  of  Examinees  Answering  Correctly.  Each  Interval  of  Psrcantaga  Is  Greater 
than  or  Equal  to  the  Lower  End  end  Less  than  the  Upper  End. 

Xowe  Revised  Dm,  Level  11 


TABLE  9-8-2  (Continued) 
low  Raviaad  Data,  Laval  12 


Subtaat 

V  RL1L2L3L4W1W2W3K1K2 


46  76  46  42  42  32  40 


low  Ravlaad  Data,  Laval  13 


Subtaat 

V  R  LI  L2  L3  L4  W1 


W2  W3  Ml  K2 


For  the  type  of  tests  Shiba  s  wrd  comprehension  tests  belong 
to,  which  will  be  introduced  in  Secticn  X.l  of  Chapter  10,  some 
specific  model,  or  models,  of  the  Informative  Dlstractor  Model  is 
called  for.  Models  A,  B  and  C  proposed  by  the  author,  which  will  be 


-191- 


1X-19 


described  in  Section  X.7  ,  belong  to  this  family  of  models.  If  we 
succeed  in  developing  appropriate  multiple-choice  test  items  which 
follow  this  type  of  models,  then  they  will  no  longer  be  blurred 
images  of  the  corresponding  free-response  test  items,  but  will 
provide  us  with  additional  Information  from  the  distractors  which 
free-response  test  items  will  never  have. 

(IX. 10)  Equivalent  Distractor  Model 

In  contrast  to  the  Informative  Distractor  Model,  Equivalent 
Distractor  Model  means  the  family  of  models  in  which  no  specific 
information  is  expected  from  separate  incorrect  answers,  which  are 
given  as  alternatives  in  the  multiple-choice  test  item.  Thus  all 
the  alternatives,  except  for  the  correct  answer,  of  a  given 
multiple-choice  item  are  equivalent,  since  the  information  given  by 
a  specific  alternative,. or  distractor,  is  not  different  from  the 
one  given  by  each  remaining  wrong  answer.  The  three-parameter 
logistic,  or  normal  ogive,  model  belongs  to  this  family  of  models. 

In  this  model,  all  the  information  provided  by  a  given  wrong  answer 
is  pure  noise  resulting  from  random  guessing,  end,  therefore,  the 
alternative  is  equivalent  with  any  remaining  wrong  answer.  Note, 
however,  that  this  type  of  model,  which  is  based  upon  the  knowledge  or 
random  guessing  principle,  is  not  the  only  one  included  by  the 
Equivalent  Distractor  Model.  Suppose  that  the  operating  characteristic 
of  each  wrong  answer  of  a  given  multimple-cholce  item  includes  some 
information  about  the  examinee's  ability,  but  all  the  operating 
characteristics,  or  plausibility  curves,  of  the  distractors  are 
identical.  In  such  a  case,  we  can  say  that  the  test  item  should 
belong  to  the  Informative  Distractor  Model  in  the  sense  that  these 
distractors  provide  us  with  some  information  concerning  the  examinee's 
ability.  On  the  other  hand,  we  can  also  say  that  the  item  should 
belong  to  the  Equivalent  Distractor  Model,  since  each  distractor  does 
not  have  any  specific  information  which  distinguishes  it  from  the  other 
distractors.  For  convenience,  in  the  present  paper,  we  shall  take  the 
second  standpoint,  defining  the  Informative  Distractor  Model  in  the 


narrower  sense. 


-192- 


IX-20 


(IX. 11)  Index  k*  for  the  Invalidation  of  the  Equivalent 
Pistractor  Model 

It  ia  obvious  that  Index  k*  ,  which  was  introduced  in 
Section  IX. 5  ,  can  be  used  for  the  invalidation  of  the  Equivalent 
Distractor  Model,  and  even  as  a  weak  support  for  the  Informative 
Pistractor  Model.  If  Index  k*  turns  out  to  be  far  less  than  m  , 
then  we  must  reject  the  hypothesis  that  our  model  should  belong 
to  the  Equivalent  Pistractor  Model.  If  it  assumes  a  value  close 
to  m  ,  then  we  shall  say  that  Equivalent  Pistractor  Model  may  be 
adequate.  In  both  cases,  however.  Informative  Pistractor  Model 
stays  among  the  possibilities. 

It  is  noted  that  the  traditional  chi-square  test  with  (m-2) 
degrees  of  freedom  for  the  goodness  of  fit  for  the  frequencies  of 
the  (m-1)  wrong  answers  with  the  uniform  distribution  as  the 
theoretical  distribution  may  serve  our  purpose  just  as  well,  without 
using  Index  k*  .  In  our  pilot  study,  we  applied  it  for  the 
original  data  of  7,439  examinees.  The  result  turned  out  to  be 
such  that  only  23,  22  and  21  test  items  indicate  the  acceptance  of 
the  respective  uniform  distributions,  or  the  acceptance  of  Equivalent 
Pistractor  Model,  for  Levels  11,  12  and  13,  respectively,  even  if  we 
take  as  low  a  level  of  significance  as  0.0005  .  This  comes  from  the 
fact  that  our  sample  sizes  are  so  large  that  the  chi-square  test 
is  very  sensitive  to  small  diversions  from  the  hypothesized  uniform 
distributions.  We  must  question,  however,  if  such  small  diversions 
mean  anything  for  our  purpose.  If,  for  instance,  the  hypothesized 
uniform  distribution  provides  us  with  the  probability  0.15  for 
each  of  the  three  wrong  answers  and  the  true  distribution  gives 
0.16  ,  0.14  and  0.15  ,  respectively,  then  the  detection  of  these 
small  deviations,  i.e.,  0.01  ,  at  most,  will  not  make  a  strong 

basis  for  the  rejection  of  the  Equivalent  Distractor  Model. 

In  contrast  to  the  chi-square  test,  the  estimated  Index  k* 
is  insensitive  to  the  sample  size,  because  the  sampling  fluctuation 
participates  in  the  resulting  estimate  only  through  the  computation 
of  the  proportions,  (cf.  Section  IX. 5  ).  Thus,  if  we  wish  to 


-193- 


XX-21 


more  or  lees  Ignore  the  sampling  fluctuations  of  the  proportions , 
then  Index  It*  stay  be  adopted,  and  these  values  can  be  comparable 
across  different  sample  sizes. 

(IX. 12)  Results  Obtained  by  Using  Index  k*  on  Iowa  Data 

Table  9-12-1  presents  the  frequency  distribution  of  the  items 
of  each  of  the  ten  subtests,  excluding  Subtest  LI,  which  consists  of 
five-alternative  test  items,  with  respect  to  the  resultant  values 
of  the  estimated  Index  k*  ,  for  each  of  Levels  11,  12  and  13.  The 
corresponding  result  for  Subtest  LI  is  presented,  separately,  as 
Table  9-12-2,  for  all  the  three  levels.  We  can  see  in  Table  9-12-1 
that  the  configurations  of  these  frequencies  are  similar  across  the 
three  levels,  with  the  range  of  the  estimated  Index  k*  ,  2.25  through 
4.00  ,  for  each  level.  This  is  also  the  case  with  Subtest  LI,  with 
the  range  of  the  estimated  Index  k*  ,  2.25  through  4.50  ,  for  most 
items,  as  is  shown  in  Table  9-12-2.  We  notice  in  Table  9-12-1  that, 
for  each  level,  the  mode  of  the  total  frequency  distribution  is  the 
highest  category,  3.75  through  4.00  .  If  we  examine  the  frequency 
distributions  of  separate  subtests,  however,  we  will  notice  that 
there  are  some  variations  among  their  configurations.  Above  all,  it 
is  noted  that  Subtests  L2,  L3  and  L4  have  different  modes  from  the 
highest  category,  i.e.,  mostly  either  the  category,  3.00  through 
3.25  ,  or  the  category,  3.25  through  3.50  .  This  tendency  is 
also  shared  by  Subtest  LI,  which  has  five-alternative  multiple-choice 
test  items,  as  is  shown  in  Table  9-12-2. 

Eight  examples  of  the  frequency  distribution  of  the  examinees 
with  respect  to  their  choices  of  an  answer  out  of  the  four  alternatives 
are  presented  as  Figure  9-12-1.  These  test  items  are  selected  from 
the  subset  of  76  test  items  for  Level  13,  whose  Index  k*  's  are  3.9 
or  greater.  For  Levels  11  and  12,  there  are  78  and  73  such  test 
items,  respectively.  In  each  histogram,  also  drawn  by  a  dotted  line 
is  the  estimated  proportion,  P*  ,  multiplied  by  the  number  of 
examinees  who  answered  the  item  in  one  way  or  another,  or  the  total 
number  of  examinees  subtracted  by  the  number  of  those  who  did  not 


TABLE  9-12-2 


Frequency  Distribution  of  Flva-Alt  amative  Itews  of 
Subteet  LI  of  low*  Te*t*  of  Baaie  Skill*,  with 
Respect  to  Index  k*  ,  for  Level*  11,  12,  and 
13  ,  Respectively . 


'S 

Rang* 

of  Index  k* 

Level 

11  12  13 

Total 

1 

1.00  -  1'.25 

"l 

* 

0 

2 

1.25  -  1.50 

0 

3 

1.50  -  1.75 

0 

4 

1.75  -  2.00 

0 

5 

2.00  -  2.25 

0 

6 

2.25  -  2.50 

4  4  1 

9 

7 

2.50  -  2.75 

5  1  2 

8 

8 

2.75  -  3.00 

4  4  4 

12 

9 

3.00  -  3.25 

2  2  8 

12 

10 

3.25  -  3.50 

5  11  6 

22 

11 

3.50  -  3.75 

9  7  5 

21 

12 

3.75  -  4.00 

4  5  11 

20 

13 

4.00  -  4.25 

5  6  4 

15 

14 

4.25  -  4.50 

4  5  4 

13 

15 

4.50  -  4.75 

1 

1 

16 

4.75  -  5.00 

1  3 

4 

Total 

43  46  48 

137 

answer  the  item  at  all.  V7e  can  see  in  this  figure  that  most  of  these 
histograms  are  close  to  rectangles,  if  we  replace  the  frequency  for 
the  correct  answer  by  the  height  indicated  by  the  dotted  line  in  each 
histogram. 

In  the  total  set  of  227  test  items,  whose  values  of  the 
estimated  Index  k*  are  greater  than  3.9  ,  we  find  only  four  tent 
items  from  Subtests  L2,  L3  and  L4,  i.e.,  L2-58  (k*“3. 95473)  and 

L3-49  (k*«3. 95320)  of  Level  11,  and  L3-49  (k*«3. 95658)  and 
L2-58  (k*»3. 95318)  of  Level  12,  which  are  actually  two  items  shared 
by  both  Levels  11  and  12,  A  close  examination  of  the  contents  of  the 
test  items  of  these  four  subtests,  including  Subtest  Ll,  and  their 
results  of  analysis  reveals  the  following  facts. 

(1)  All  the  questions  in  these  four  language  skill  subtests  are 
in  the  form  of  having  the  examinee  find  mistakes  in  spelling. 


racauiiev  TAiautsCT 


■196- 


IX- 2  A 


ALTI MNATIVI  COO*  ALT®  ANATIVI  COOI 


1234  i  a  a  4 


ALT!  MNATIVI  CODS  ALTINIIATIVI  COOI 


ALTIAMATIVI  CODA  ALT  I  MNATIVI  COOI 


1  2  3  4  1  2  3  4 

ALTIAMATIVI  COOI  ALTIMNATIVI  COOI 


FIGURE  9-12-1 


Frequency  Distribution  of  Exaainies  of  Level  13  with  Respect  to  Their  Responses  to  Esch  of 
Eight  Test  Iteas  of  love  Tests  of  Bsslc  Skills  Seapled  frerr  Those  Whose  Vslues  of  Index  k* 
are  3.9  or  Greeter,  with  the  Estimated  Froportlor  of  the  Exsalness  Guessing  Correctly 

(Dotted  Line) . 


capitalization,  punctuation  and  usage,  respectively. 

(2)  Unlike  the  test  items  in  the  other  seven  subtests,  these 
items  nave  "No  mistakes"  as  the  last  alternative,  and  for 
most  items  this  alternative  has  a  high  frequency,  even 
when  it  is  a  wrong  answer. 


-197- 


IX-25 


From  these  facts  and  the  above  results ,  It  Is  obvious  that 
Equivalent  Dlstractor  Model  is  not  suitable  for  the  items  of  the 
four  subtests  of  language  skills,  including  Subtest  LI,  which 
consists  of  five-alternative  test  items.  For  these  items, 

Informs -.ive  Distractor  Model  may  be  more  appropriate. 

Figure  9-12-2  presents  similar  histograms  to  those  in  Figure 
9-12-1  for  the  frequency  distributions  of  eight  four-alternative  test 
items,  which  were  selected  from  the  subset  of  9  test  items  whose 
Index  k*  's  are  less  than,  or  equal  to,  2.6  ,  for  Level  13.  The 
corresponding  numbers  of  test  items  are  7  for  Level  11,  and  11  for 
Level  12,  respectively.  We  can  see  in  this  figure  that  these 
histograms,  whose  frequencies  for  the  correct  answers  are  replaced 
by  the  corresponding  dotted  lines,  are  far  from  rectangles.  There 
is  no  reason  to  accept  Equivalent  Dlstractor  Model  for  these  test 
items. 

(IX. 13)  Comparison  of  the  Results  on  Common  Test  Items  for  Three 
Levels  of  Examinees  in  Iowa  Study 

There  are  certain  test  items  which  are  included  in  all  the 
three  levels.  Their  numbers  are  nine  for  Subtest  V,  nineteen  for 
Subtest  R,  nine  for  Subtest  LI,  ten  for  Subtest  L2,  ten  for  Subtest 
L3,  eleven  for  Subtest  L4,  ten  for  Subtest  Wl,  six  for  Subtest  W2 
and  sixteen  for  Subtest  W3,  which  make  the  total  number  of  test 
items  shared  by  all  the  three  levels  one  hv.idred.  There  is 
no  item  which  is  included  in  all  three  levels  for  Subtest6  Ml  and  M2. 

It  is  evident  that,  for  the  behavior  of  the  test  item  to  follow 
Equivalent  Distractor  Mode].,  not  only  the  value  of  estimated  Index 
k*  should  be  close  to  m  for  one  level  of  examinees  but  also  for  all 
three  levels.  It  will  be  worthwhile,  therefore,  to  compare  the 
results  across  the  three  levels  for  these  one  hundred  test  items  which 
are  included  in  all  the  three  levels  of  test.  We  find  that  only  7 
out  of  the  91  four-alternative  test  items,  i.e.,  V-61,  R-88,  Wl-45, 
Wl-46,  W2-44,  W2-45  and  W3-70,  have  three  estimates  of  index  k*  all 
of  which  are  greater  than,  or  equal  to,  3.9  .  If  we  shift  this 


Subtait  V 


k*  •  2.32986 


lubcaat  V 
Ita  84 
k»  •  2. 4774! 


ALTON  AT  IV I  SOON 

rhktut  V 
Ita  99 

k*  -  2.5052O 


ALTINNATIVI  0001 

flub tilt  U 

nr.*™  3 


SublMt  LZ 
Ita  63 
k»  •  2.49213 


Subtat  U 
Ita  67 
k*  •  2.25958 


Subtat  LZ 
Ita  70 
k*  •  2.54443 


AUINNATIVI  1001 

r  tub tact  LZ 

Ita  79 
k*  •  2.44666 


S  too 


12  3  4 


12  3  4 


12  3  4 


12  3  4 

ALTINNATIVI  0001 


FIGURE  9-12-2 


Krequancy  Dlatributlon  rf  Kxaalnaes  of  Lavel  13  with  Riapact  to  Th*ir  Raapoaaaa  to  Eici\  of  Eight 
Test  ItABS  of  Iowa  Taata  of  Baaic  Skill!  Saaplad  froa  Thoaa  Hhoae  Valuaa  of  Xndax  k*  ara  2.6 
or  laaa,  with  tha  Eatlaatad  Proportion  of  tha  Exaalnaaa  Guesalng  Corractly  (Dottad  Lina). 


critical  value  from  3.9  to  3.8  ,  these  seven  four-alternative  teBt 
items  are  joined  by  eleven  more  items,  i.e.,  V-63,  V-66,  R-80,  R-90, 
R-92,  L2-58,  L3-49,  Wl-40,  Wl-43,  Wl-47  and  W2-41  .  There  are  no 
five-alternative  test  items  of  Subtast  Ll  which  are  comparable  to 
these  eighteen  four-alternative  test  items. 

Figure  9-13-1  presents  four  ezcamples  of  the  sets  of  the  three 
histograms  for  Levels  11,  12  and  13,  which  are  similar  to  those  in 
Figures  9-12-1  and  9-12-2,  and  sampled  from  the  total  nineteen  shared 
test  items  of  Subtest  R. 


-201- 


IX- 29 


It  Is  interesting  to  note  that  some  items  show  evidence  of 
differential  information  provided  by  separate  wrong  answers.  For 
example,  alternative  4  of  R-80  seems  to  attract  students  of 
intermediate  reading  ability,  while  alternative  1  of  the  same  item 
appears  to  attract  students  of  lower  levels  of  ability.  It  is  clear 
that  many  items  have  one  or  more  effective  distractors,  and,  among 
others,  alternative  2  of  R-86  proved  to  be  powerful.  Most  histograms 
have  some  regularities  in  the  way  the  frequencies  change  across  the 
three  levels,  which  suggest  that  the  examinees  selected  their  answers 
intentionally  rather  than  by  random  guessing. 

For  the  detail  of  the  Iowa  Study,  the  reader  is  directed  to 
the  research  report,  RR-80-1. 

CIX.14)  Remarks  on  the  Usage  of  Index  k* 

It  should  be  noted  that  high  values  of  Index  k*  can  happen 
in  situations  where  Informative  Distractor  Model  is  perfectly  legitimate. 
When  this  happens,  our  information  is  differentiated  for  the  separate 
distractors,  and  yet  the  number  of  examinees  who  selected  each  distractor 
as  their  answers  is  close  to  that  of  each  other.  This  is  an  ideal 
situation  for  our  purpose  of  mental  measurement,  because,  not  only  each 
distractor  is  informative,  but  also  all  of  these  distractors  are  well 
used,  with  the  examinees'  answers  distributing  evenly  over  the  distractors. 
We  recall  Sato's  Index  k,  which  was  introduced  in  Section  IX. 4,  is  for 
this  purpose,  and  works  well  in  the  small  classroom  situation  where 
teachers  supervise  their  students  well  and  there  is  little  chance  for 
the  students  to  make  random  guessing. 

The  above  fact  makes  us  realize  that  we  must  be  careful  before 
we  make  conclusions  from  the  estimated  values  of  Index  k*.  Observation 
of  the  values  of  Index  k*  across  several  subpopulations  of  examinees  of 
different  ability  levels,  like  the  one  for  Levels  11,  12  and  13  of 
Iowa  Data  which  was  introduced  in  the  preceding  section,  is  one  of 
the  ways  of  finding  out  the  cause  for  high  values  of  Index  k*.  If 
it  is  due  to  the  equivalence  of  the  distractors,  then  we  will  have 
similar  values  of  Index  k*  across  the  subpopulations;  if  differential 


-202- 


IX-30 


Information  exists  for  the  separate  distractorB,  then  the  values  of 
Index  k*  will  differ  for  the  separate  subpopulations,  provided  that 
their  ability  differences  are  substantial.  Another  way  is  to  compare 
the  sample  means  of  the  ability  estimate  among  the  subgroups  of 
examinees  who  selected  separate  distractors  for  their  Answers.  If 
differential  information  exists,  then  these  sample  means  of  the 
ability  estimate  will  also  differentiate,  while  they  will  stay  close 
to  one  another  if  the  distractors  are  equivalent.  This  was  done  in 
Shiba's  study,  which  will  be  introduced  in  Section  X.3  of  the  next 
chapter . 

With  these  considerations  in  mind,  Index  k*  can  be  used 
effectively. 


REFERENCES 


[1]  Bimbaum,  A.  Some  latent  trait  models  and  their  use  in 

inferring  an  examinee’s  ability.  In  F.  M.  Lord  and 
M.R.  Novick;  Statistical  theories  of  mental  test  scores. 
Addison-Wesley,  1968,  Chapters  17-20. 

[2]  Hieronymous,  A.  N.  and  E.  F.  Lindquist.  Iowa  test  of  basic 

skills  (Levels  ed.),  Form  6.  Bostons  Houghton-Mifflin 
Company,  1971. 

[3]  Iowa  Basic  Skills  Testing  Program.  Iowa  test  of  basic  skills 

teacher’s  manual.  Iowa  City,  Iowa:  University  of  Iowa, 
1971. 

[4]  Lord,  F.  M.  An  analysis  of  the  verbal  scholastic  aptitude 

test  using  Bimbaum' s  three-parameter  logistic  model. 
Educational  and  Psychological  Measurement,  1968,  28, 
989-1020. 


-203- 


X-l 


X  A  New  Family  of  Models  for  the  Multiple-Choice  Test  Item :  II 

In  the  preceding  chapter.  Index  k*  was  introduced,  and 
through  the  work  on  Iowa  Data  we  have  seen  that  the  type  of  models, 
which  are  based  upon  the  principle  of  knowledge  or  random  guessing, 
do  not  work  for  many  multiple-choice  test  items.  In  this  chapter, 
Shiba*s  research  on  the  word  comprehension,  and  then  the  new  family 
of  models  for  the  multiple-choice  test  items,  will  be  introduced. 

(X. 1)  Shiba's  Word  Comprehension  Tests 

The  battery  of  tests  used  for  the  construction  of  the  word 
comprehension  scale  consists  of  eleven  tests,  Al,  A2,  A3,  AA,  A5, 

A6,  Jl,  J2,  SI,  S2  and  U  .  Each  test  contains  thirty  to  fifty- 
eight  multiple-choice  items,  each  having  a  set  of  five  alternatives. 
These  tests  differ  in  difficulty,  and  each  of  them  is  designed  for 
a  different  group  of  ages,  ranging  from  six  years  of  age  to  the  ages 
of  college  students.  There  are  subtests  of  items  included  in  two 
tests,  which  are  adjacent  to  each  other  in  difficulty.  For  example, 
items  37  through  56  of  Test  Jl  are  also  items  1  through  20  of  Test 
J2.  The  number  of  examinees  used  for  the  word  comprehension  scale 
construction  varies  between  412  sixth  graders  of  elementary  schools 
for  Test  A5  and  924  second  graders  of  senior  high  schools  for  Test 
SI  (Shiba,  1978). 

The  model  adopted  for  the  item  characteristic  function  of 
each  vocabulary  item  is  the  logistic  model  which  is  given  by  (8.12), 
with  D  ■  1.7  ,  as  the  substitute  for  the  normal  ogive  model.  Note 
that  Shiba  did  not  use  the  three-parameter  logistic  model.  This  is 
based  upon  his  belief  that  three-parameter  models  are  not 
applicable  for  well-developed  multiple-choice  items,  which  he  has 
formed  through  his  many  experiences  in  test  construction  and 
research. 

The  author  found  Shiba's  research  very  interesting, 
especially  in  the  following  aspects. 

(1)  The  word  comprehension  tests  are  very  well  constructed. 


-204- 


X-2 


choosing  each  alternative  carefully. 

(2)  Unlike  many  researchers  in  the  United  States,  they  have 
tried  to  make  a  full  use  of  the  distractors. 

(3)  Subjects  were  selected  from  many  different  age  groups. 

(X.2)  Subjects  Used  in  Shiba's  Research 

Each  of  the  eleven  tests  was  administered  to  a  group  of 
subjects  who  belong  to  a  single  school  year,  except  for  college 
students.  Hereafter,  for  convenience,  we  shall  use  EL  for 
elementary  schools,  JH  for  junior  high  schools,  SH  for  senior  high 
schools,  and  CS  for  colleges,  and  add  the  school  year  after  each 
symbol.  For  instance,  by  SH2  we  mean  a  group  of  subjects  who  are 
in  the  second  year  of  senior  high  schools.  The  correspondence  of 
the  subject  groups  and  the  tests  administered  is  summarized  as 
follows: 

A1  for  ELI  (650),  A2  for  EL2  (650),  A3  for  EL3  (546), 

A4  for  EL4  (617),  A5  for  ELS  (599),  A6  for  EL6  (412), 

J1  for  JH1  (614),  J2  for  JH2  (758),  SI  for  SHI  (924), 

S2  for  SH2  (759),  and  U  for  CS  (740). 

where  the  numbers  in  parentheses  indicate  respective  numbers  of 
examinees.  Note  that  JH3  and  SH3  are  not  included  in  the  data 
which  are  the  basis  of  the  word  comprehension  scale  construction. 

(X.3)  Methods  and  Results  of  Shiba's  Research 

It  is  assumed  that,  for  each  of  the  eleven  groups  of 
examinees,  the  ability  distribution  is  normal.  The  principal 
factor  solution  of  factor  analysis  is  applied  for  the  tetrachoric 
correlation  matrix  for  each  group  of  examinees,  using  the  largest 
absolute  value  of  the  correlation  coefficient  in  each  row,  or 
column,  as  the  communallty.  This  step  is  also  the  process  of 
validating  the  unidimensionality  of  ability.  Figure  10-3-1 
illustrates  the  resulting  set  of  eigenvalues  for  Test  J1  which  was 


-205- 


FIGURE  10-3-1 

Elg*nvalu*»  of  th*  Correlation  Matrix  of  tha  Tlfty-Tlva  ItwM  of 
Taat  Jl,  Ordarad  with  Kaapact  to  Thalr  Magnitude* .  (Shlba's  Data) 


X-3 


1 


1 

1 


administered  to  614  first  ye>>r  Junior  high  school  students.  It 
turned  out  that  the  first  eigenvalue  is  much  larger  than  all  the 
other  eigenvalues,  and  thus  the  unidimensionallty  was  confirmed. 
Hereafter,  this  first  principal  factor  is  treated  as  6  . 

Let  Pg  be  the  factor  loading  (e.g.,  Lawley  and  Maxwell, 
1971)  of  the  first  principal  factor,  or  0  ,  for  item  g  .  The 
item  discrimination  parameter,  a  ,  is  obtained  by 

O 

(10.1)  »g  -  «g<l-V1/2  • 

Let  <S>(u)  denote  the  standard  normal  distribution  function,  such 
that 


1 

$ 


-206- 


X-A 


(10.2)  $(u)  -  (2ir)“1/2  (  e”^2  dt  . 

/  -00 

The  item  difficulty  parameter,  b  ,  is  given  by 

8 

(10.3)  bg  -  «"1(l-pgR)  Pg_1  , 

where  pgR  is  the  probability  with  which  the  examinee  answers 
item  g  correctly.  In  practice,  this  is  replaced  by  the  frequency 
ratio,  PgR  ,  to  provide  us  with  the  estimate  of  bg  . 

The  eleven  ability  scales  thus  constructed  are  assumed  to  be 

on  the  same  continuum,  and  they  are  integrated  into  a  single  scale. 

This  equating  is  made  through  the  ten  subsets  of  items,  each  of 

which  is  shared  by  two  adjacent  tests.  let  a  and  b  be  the 

8  8 

item  parameters  estimated  from  the  result  of  the  first  test,  and 

a*  and  b*  be  those  from  the  result  of  the  second  test.  Denoting 
8  8 

the  two  ability  scales  by  6  and  0*  ,  respectively,  we  can  write 
(10. A)  a  (0-b  )  -  a*(0*-b*)  , 

o  o  6  © 

since  the  item  characteristic  functions,  which  follow  the  normal 
ogive  model,  of  the  same  item  g  on  the  two  ability  scales  must 
assume  the  same  value  for  the  corresponding  values  of  6  and  6*  . 
Thus  the  functional  relationship  between  6  and  6*  is  given  by 

(10.5)  6*  -  (a  / a*) 6  +  tb*-(a  /a*)b1  , 

o  o  o  5  o  o 

which  is  linear,  and  the  two  coefficients  are  obtained  from  these 
four  parameters.  In  practice,  we  obtain  as  many  sets  of 
coefficients  as  the  number  of  common  items,  and  we  need  to  use  some 
type  of  "average"  of  these  coefficients  for  the  scale  transformation. 
Figure  10-3-2  presents  the  ability  distributions  of  eleven  subject 
groups  after  such  transformationswere  made  and  the  mean  and  the 
standard  deviation  of  the  distribution  of  J1  are  taken  as  the 


E  lit  lasted  Density  Functions  of  the  Twelve  Groups  of  Examinees,  Which  Are  Assuaed  to  Be  Normal. 
The  Ability  Scale  Is  Defined  in  Such  e  Wsy  that  the  Density  Function  of  the  Flsrt  Grade  Croup 
of  Junior  High  School  (JH1)  Is  n(0,l)  ,  (Shlbs's  Data) 


origin  and  the  unit  for  the  new,  integrated  ability  dimension. 


The  item  characteristic  function  of  each  item  on  the  new, 


integrated  scale  0  is  approximated  by  the  logistic  function,  which  is 
given  by  (8.12).  The  maximum  likelihood  estimate,  6^  ,  of  each 
examinee's  ability  is  obtained  through  the  equation 


(10.6) 


g.Y*vv 


la  x  . 
g-x  8  8J 


(cf.  Birnbaum,  1968),  where  x  is  the  binary  item  score  of 
individual  j  for  item  g  .  The  item  information  function  of 
each  test  item,  and  then  the  test  information  of  each  test,  are 


j 


i 


obtained  (cf.  Section  III. 4). 

The  theoretical  frequency  distribution  of  test  score  T  for 
each  test  and  examinee  group  can  be  written  as 

(10.7)  NS  E  P  (0)  8  [1-P  <6)1  8  , 

VeT  u  eV  8  g 

S 

where  V  is  a  response  pattern  of  a  vector  of  n  items  scores , 
and  T  is  the  test  score  given  by 

n 

(10.8)  T  -  E  x 

g-1  g 

This  is  used  for  the  validation  of  the  model  and  assumptions 
adopted  in  the  process  of  analysis.  The  sample  mean  of  the  maximum 
likelihood  estimates  §  of  the  subgroup  of  examinees,  who  selected 
each  of  the  five  alternatives  is  calculated,  for  each  item  of  each 
test.  A  tailored  test  of  the  word  comprehension  is  constructed  by 
selecting  an  appropriate  subset  of  items  from  these  eleven  tests, 
in  such  a  way  that  an  individual  is  directed  to  a  next  item  which 
is  chosen  on  the  basis  of  the  sample  mean  of  6  of  the  alternative 
he  has  selected  for  the  present  item. 

The  research  conducted  by  Shiba  and  others  Includes  more 
interesting  data  than  were  used  in  the  word  comprehension  scale 
construction.  Table  10-3-1  presents  a  part  of  them,  in  which  the 
frequency  distribution  of  the  alternative  selection  by  the  first  year 
students  of  junior  high  schools,  and  the  mean  of  the  maximum 
likelihood  estimate  of  ability  for  each  alternative  are  shown  for 
nineteen  items  included  in  both  Tests  J1  and  J2,  and  administered 
to  four  different  subject  groups,  JH1,  JH2(a),  JH2(b)  and  JH3.  In 
the  same  table,  also  presented  is  the  discrepancy  between  the  mean 
of  6  for  the  correct  answer  and  the  lowest  mean  §  for  one  of 
the  four  wrong  answers,  under  the  heading,  "largest  discrepancy." 

The  correct  answers  are  always  identified  as  the  ones  which  have 


t 


y 


IV 


I 

[■ 

r.' 


! 


-209- 


TABLE  10-3-1 


Kean  of  the  Maxiaua  Likelihood  Eatlaste*  of  Ability,  *  (  for  t,ch  of  tho 
Five  Subgroup!  of  Subjects  Selecting  Different  Alternatives,  for  Each  of 
the  19  Vocabulary  Test  Iteas,  Together  with  the  Actual  Frequency  Distri¬ 
butions  (FRQ) .  The  Difference  between  the  Mean  §  of  the  Correct  Sub¬ 
groups  and  the  Lowest  Mean  §  Is  Also  Presented  As  Largest  Discrepancy 
for  Each  I ten.  Test  Jl,  Junior  High  School  Grade  1 


Itea 

Indices 

1 

Alternative 

2  3  4  S 

Total 

Largest 

Discrepancy 

37 

Mean  § 
FRQ 

0,401 

287 

-0.476 

50 

-0.482  -0.750  -0.148 

59  59  11? 

572 

1.151 

38 

Mean  1 
FRQ 

• 

39 

Mean  6 
FRQ 

-0.192 

91 

-0.091 

115 

-0.270  -0.243  0.400 

118  51  187 

562 

0.670 

AO 

Mean  6 
FRQ 

0.071 

60 

-0.416 

141 

-0.336  0.310  -0.479 

90  273  9 

573 

0.789 

Ai 

Mean  6 
FRQ 

-0.557 

53 

-1.007 

20 

-0.445  -0.456  0.254 

23  85  392 

573 

1.261 

42 

Mean  6 
FRQ 

0.339 

247 

-0.570 

21 

0.036  -0.439  -0.387 

121  84  97 

570 

0.909 

A3 

Mean  6 
FRQ 

-0.512 

26 

0.376 

308 

-0.572  -0.245  -0.393 
98  67  73 

572 

0.948 

A4 

Mean  0 
FRQ 

-0.293 

119 

-0.547 

67 

-0.595  0.271  -0.318 

14  333  36 

569 

0.866 

45 

Mean  0 
FRQ 

-0.638 

51 

-0.412 

25 

-0.636  0.395  -0.593 

123  346  23 

568 

1.033 

46 

Mean  8 
FRQ 

0.444 

296 

-0.741 

46 

-0.325  -0.428  -0.534 

44  164  18 

568 

1.185 

47 

Mean  8 
FRQ 

-0.261 

69 

C.270 

224 

-0.078  -0.426  -0.101 
158  53  65 

569 

0.696 

48 

Mean  8 
FRQ 

-0.129 

81 

-0.024 

100 

-1.013  -0.467  0.412 
58  67  258 

564 

1.425 

49 

Mean  6 
FRQ 

-0. 339 
115 

-0.390 

31 

-0.284  -0.464  0.309 

42  70  315 

573 

0.773 

50 

Mean  8 
FRQ 

0.349 

308 

-0.256 

46 

-1.015  -0.317  -0.385 

35  86  96 

571 

1.364 

51 

Mean  8 
FRQ 

-0.137 

89 

-0,640 

82 

-0.077  -0.136  0.429 

75  113  201 

560 

1.069 

52 

Mean  8 
FRQ 

-0.219 

116 

0.291 

235 

-0.110  -0.608  -0.095 

80  34  100 

565 

0,899 

53 

Mean  8 
FRQ 

-0.071 

163 

-0.030 

51 

-0.453  0.527  -0.241 

34  143  181 

572 

0.980 

54 

Mean  « 

FRQ 

0.132 

182 

-0.060 

111 

-0.084  -0.037  -0.263 
100  142  26 

561 

0.415 

55 

Mean  8 
FRQ 

0.114 

27 

-0.278 

72 

-0.172  -0.533  0.690 

317  29  126 

571 

1.223 

56 

Mean  8 
FRQ 

-0.460 

104 

-0.113 

101 

-0.412  0.742  0.015 

115  141  111 

572 

1.202 

j&afcaaaa, 


the  highest  means  of  6  . 

(X.4)  Distractors  As  Resources  of  Information 

Shiba's  research  is  based  upon  his  belief  in  the  usefulness 
of  distractors  as  important  resources  of  information,  in  addition 
to  the  correct  answer.  This  is  the  same  belief  which  the  author 
has  kept  in  mind  for  many  years  (cf.  Sameji-  t,  1968),  As  far  as  we 
score  the  multiple-choice  test  item  correct  or  incorrect  and  treat 
it  as  a  binary  item,  it  can  never  surpass  the  free-response  test 
item,  but  will  always  stay  as  a  "blurred"  image  of  the  free-response 
test  item,  owing  to  the  noise  caused  by  the  examinee's  guessing 
behavior,  etc.  If  we  make  the  full  use  of  the  information  given  by 
distractors,  however,  then  the  multiple-choice  test  item  will  have 
the  merit  of  its  own,  and  can  even  be  more  informative  than  ths 
free-response  test  item.. 

It  is  researchers'  responsibility  to  increase  the  efficiency 
in  mental  measurement.  To  ignore  whatever  legitimate  information 
we  can  obtain  from  our  research  data  is  against  this  principle.  If 
distractors  can  serve  for  this  purpose,  we  should  certainly  not  to 
stay  with  models  like  the  three-parameter  logistic  model,  in  which 
all  the  wrong  answers  given  as  alternatives  in  the  multiple-choice 
test  item  are  treated  as  being  equivalent,  without  any  information 
of  their  own.  It  will  be  worth  our  effort  to  investigate  Informative 
Distribution  Model  rather  than  to  stay  with  the  Equivalent 
Distractor  Model  (cf.  Sections  IX. 9  and  IX. 10). 

(X.5)  Mathematical  Models  in  Physics  and  in  Psychology 

The  role  of  mathematical  models  in  any  science  may  be  to 
describe  its  reality  following  an  appropriate  rationale.  We  must 
recognize,  however,  some  difference  between  the  role  of  mathematical 
models  in  pure  natural  sciences,  like  physics,  and  that  in 
psychology.  This  difference  comes  from  the  fact  that,  while  in 
physics  it  is  impossible  or  meaningless  to  change  natural  phenomena 
to  which  objects  react,  in  psychology  many  phenomena  to  which 


-211- 


X-9 


persons  react  are  also  made  by  persons,  and  it  is  quite  legitimate  to 
change  them  for  good  causes. 

The  latter  logic  is  directly  applicable  to  models  for  the 
multiple-choice  test  item.  An  important  implication  is  that  we 
may  be  able  to  do  better  than  supplying  mathematical  models  for 
the  existing  test  items  rather  passively.  If  we  conceive  of  some 
mathematical  models  which,  in  theory,  will  enhance  the  efficiency 
in  mental  measurement,  we  shall  be  able  to  advise  test  constructors 
to  develop  the  types  of  multiple-choice  test  items  which  follow  our 
models,  instead  of  accepting  whatever  test  items  they  produce.  We 
can  also  adjust  the  pressure  and  its  directions  which  are  put  upon 
examinees,  by  changing  our  instructions  appropriately.  To  give  an 
example,  we  can  effectively  discourage  our  examinees  to  guess,  or 
to  skip  items. 

(X.6)  Normal  Ogive  Model  on  the  Graded  Response  Level  and  Bock  *  s 

Multinomial  Model 

Normal  og^ve  model,  which  was  originally  introduced  as  a 
model  for  e  binary,  f ree-response  test  item,  has  been  expanded  to 
fit  a  more  general  case,  in  which  an  item  is  graded  into  more  than 
two  item  score  categories  (Samejima,  1969,  1972).  Bock  has 
proposed  a  multinomial  model  (Bock,  1972),  for  the  multiple-choice 
test  item.  It  has  been  pointed  out  (Samejima,  1972)  that,  although 
Bock's  model  was  originally  developed  for  nominal  categories,  i.e., 
the  categories  which  are  not  ordered  among  themselves,  it  can  be 
considered  as  a  model  in  the  heterogeneous  case  of  the  graded 
response  level. 

Let  g  be  a  multiple-choice  item,  h  ,  i.  or  k  be  one  of 
its  m^  alternatives,  and  X^  ,  or  be  the  response 

tendency  for  the  alternative,  h  ,  i  or  k  .  When  any  two 
alternatives,  h  and  k  ,  are  compared  alone,  the  probability  ;  ith 
which  h  is  chosen  in  preference  to  k  is  assumed  to  be  a  function 
of  ability  9  ,  and  is  denoted  by  ^^(Qjg)  .  Thus  we  can  write 


-212- 


X-10 


(10.9) 


irhk(e;g)  +  \h<8j8) 


When  the  comparison  is  made  among  u  (*2)  alternatives,  the 

8 

conditional  probability  with  which  the  alternative  h  is  chosen 

in  preference  to  all  the  other  (m  -1)  alternatives,  given  9  ,  is 

8 


denoted  by  P^(6;g)  ,  and  we  have 


(10.10)  1S  P.(9;g)  -  1  . 

h-1  n 


We  shall  define  a  variable 


,  such  that 


(10.11) 


v  -  V 


i.e.,  the  difference  between  the  two  response  tendencies,  X,  and 

hg 


Hereafter,  for  simplicity,  we  shall  drop  the  subscript  g  , 

whenever  it  is  clear  that  we  are  dealing  with  only  one  multiple- 

choice  item.  Thus,  in  such  a  case,  ir, .  (6)  is  used  for  n. ,  (6;g)  , 

uK  nK 

Xhk  for  xhk.g  »  and  80  forth. 

In  the  multinomial  model,  it  is  assumed  that;  1)  the 
conditional  distribution  of  X^  ,  given  9  ,  is  normal,  with 
uk(0;g)  ,  or  Uk(6)  »  and  0^(0; g)  ,  or  ok(0)  ,  as  the  two 
parameters;  2)  xk's  are  conditionally,  mutually  Independent ,  given 
0  ;  and  3)  the  ratio  of  the  probabilities  with  which  the  two 
alternatives  are  chosen,  respectively,  is  invariant  for  the  set  of 
alternatives  among  which  the  two  alternatives  are  compared.  Thus 
for  the  third  assumption  we  can  write 

(10.12)  Ph(6)/Pk(0)  -  V<e)/irkh<6)  * 


From  the  first  two  of  these  assumptions,  it  is  derived  that 
the  conditional  distribution  of  X^k  ,  given  8  ,  is  also  normal. 


-213- 


X-ll 


with  uhk<e ; g)  ,  or  yhk<6)  .  and  0^(0  ;g)  ,  or  ahk(6)  ,  as  the 
two  parameters,  which  are  given  by 

(10.13)  Phk(e)  =  vh(e)  -  uk(e)  . 
and 

(10.14)  ahk(6)  -  [oj(e)  +  a£(0)]l/2  . 

We  can  also  write  for  and  1^(8)  such  that 

(10.15)  -rhlC6)  -  (2»)'1/2«hk(8)'l|’0  “P t >=^/ ^ 
and 

(10.16)  itkh<8)  -  (2,r1/2a  (er1f  exp[-{Xw^-vhk(e))2/(2c.2k(e))]dXh)[ 

/  —CO 

Now  we  shall  use  the  logistic  approximation  to  the  normal 
distribution  function,  which  is,  with  D  »  1.7  ,  given  by 

(10.17)  (2tt)~1/2  j  e'uZ/2  du  >*  [l+exp{-Du}]_1  . 

Thus  we  obtain  from  (10.13),  (10.14),  (10.15),  (10.16)  and  (10.17) 

(10.18)  "hk^^kh^  *  I 1+exp {Dy^k ( 6)  / c^k w0)  }  J 

ti-{i+exPTDyhk(e)/ahk(e)3}‘1] 

■  explDyhk(6)/ohk(e)] 

-  explD{yh(e)-yk(0)}/{a2(e)+o2/e)}l/2j  _ 


From  (10.12)  and  (10.18)  we  can  write 


-214- 


X-12 


/  1 

.  * 


,-1 


m 


o 


(10.19)  [Pk(6)l  -  ^lP1(e)/Pk(0)]  -  2  Iirlk(0)/«klCe)l 


m 


*  E  exp[D{vi(0)-Wk(6)}/{0^C6)+ok(6))1/2) 


and  then 


(10.20)  Ph(8)  -  Pk(0)[,hkC0)/irkh<e)] 

*  expiD{^(e)-pk(e)}/{ok(6)+ok(8)r/i:] 


[^eXp{Dlyi(e)-yk(9)]/[o2(e)+Ok(0)]1/2}]“1 


(  . 


for  h**l,2, . . .  ,m 
alternative. 


Note  that  k  is  an  arbitrarily  chosen,  fixed 


If  we  add  two  other  assumptions  such  that:  4)  the  regression  of 
the  response  tendency  X^  (h“l,2, . , . ,m)  is  linear;  and  5)  the  conditional 
variance  of  X^  ,  given  0  ,  is  constant,  i.e., 


(10.21)  ph(0)  “  a^e  +  ch 


and 


(10.22) 


%<e>  -  \  ■ 


then  we  can  write 


(10.23)  D[yh(8)-yk(8)3/[o2(0)+a2(0)3l/2  .  ^Q+  ^  # 


where 


(10.24)  ah  -  D(a*-a*)/Co2+o2)1/2 


and 


tt- . -r^r5 


-215- 


X-13 


(10.25)  ch  -  D(c?;-^)/Ca^+a^)1/2 


Substituting  (10.23)  into  (10.20) ,  we  obtain 


(10.26)  Ph(6)  "  exptfi^e+e^H  E  exp^a^O*^}]” 

Thus  (10.26)  specifies  the  operating  characteristic  of  the  category  h 
in  the  multinomial  model.  Note  that  both  a^'s  and  c^'s  in  (10.26) 
are  of  arbitrary  origins ,  for  we  have  for  arbitrary  d  and  e 


(10.27)  P^S)  “  exp[a^6+c^)expld0+e] [  I  expidO+eiexpia^O+c^}] 

“  -3 

-  exp[(ah+d)6+(ch+e)][  Z  exp{ (a1+d)0+(c1+e) }]  . 

While  in  the  multinomial  model  we  assume  m  different  response 
tendencies  and  their  conditional  independence,  and  the  invariance  of 
the  ratio  of  the  two  probabilities  of  alternative  selection,  in  the 
normal  ogive  model  on  the  graded  response  level,  we  assume  that  there 
exists  a  single  response  tendency ,  or  item  variable ,  Xg  ,  or  X  , 
behind  the  selection  of  any  one  of  the  m  alternatives,  and  the 
conditional  distribution  of  X  ,  given  6  ,  is  normal,  with  p(0) 
and  o(6)  as  the  two  parameters.  In  addition  to  this  first  assumption, 
we  also  assume  for  the  normal  ogive  model  that:  2)  the  whole  dimension 
of  the  item  variable  X  is  divided  into  m  subintervals;  and  3)  the 
alternative  h  will  be  selected  if  the  examinee's  response  tendency 
is  in  the  subinterval  assigned  to  that  category.  We  can  write 


(10.28) 


Ph(e> 


t27T]‘1/2[0(e)) 


■i  (\  , 

/  Vi 


exp{-{u-y(6)  )2/ {2o( 6)2 }]du 


where  is  the  upper  endpoint  of  the  subinterval  of  X  which  is 

assigned  to  the  category  h  ,  and  we  have 


(IQ. 29) 


at  co 


-216- 


X-14 


The  additional  two  assumptions,  4)  and  5),  for  the  multinomial  model, 
which  are  formulated  by  (10.21)  and  (10.22),  respectively,  are  also  adopted 
for  the  item  variable  X  in  the  normal  ogive  model.  Thus  ve  can  write 
for  the  conditional  expectation,  or  regression,  of  X  on  0  and  the 
conditional  variance  of  X  ,  given  6  , 

(10.30)  p(8)  *■  a*e  +  c*  , 
and 

(10.31)  o2(6)  -  c2  . 


Substituting  (10.30)  and  (10.31)  into  (10.28),  ve  obtain 


(10.32) 

Ph(0)  -  [2Tt3“1/Za“ 

'*■  [  ^  expt-(u-a*6-c*)2/(2c2) )  du 

’  Vi 

-  t2*r1/2 

Ay -a*0-c*)/o  _  2,*,  J 

j  h  expt-t  /2 1  dt 

/  (Y^^-a*e-c*)/o 

-  t2*r1/2 

Aa*0+c*-Yh  i>/o  2 

L  _  *  w  exp[-t  /2)  dt  , 

/  (a*e+c*-Yh) /o 

where 

(10.33) 

t  “  (u-a*6-c*)/cj 

• 

We  define 

the  item  parameters. 

a„  ,  or  a  ,  and  b  ,  or  b 
g  ng  h 

such  that 

(10.34) 

a  -  a */c 

and 


» 


(10.35) 


bh  -  Cy^-c*) /a* 


where 


(10.36) 

Substituting  (10.34)  «nd  (10.33)  into  (10.36),  we  obtain  for  the  normal 
ogive  model  on  the  graded  response  level, 

■  • 

(10.37)  P.  (e)  -  (2ir]  '  n  1  eacpl-t  /2]  dt  . 

h  J  a(6-bh) 

We  have  seen  in  the  preceding  paragraphs  that  in  both  the  normal 
ogive  model  on  the  graded  response  level  and  Bock's  multinomial  model 
the  normal  assumption  is  made  for  the  conditional  distribution  of  the 
response  tendency,  given  ability  6  .  The  biggest  difference  between 
the  two  models  is  that,  in  the  normal  ogive  model,  a  single  item 
variable  is  assumed  behind  the  examinee's  selection  behavior,  whereas 
in  the  multinomial  model  a  separate  response  tendency  is  assumed  for 
each  of  the  m  alternatives. 

These  two  models,  and  logistic  model  on  the  graded  response 
levels  (Samejima,  1969,  1972) ,  whose  operating  characteristic,  ?h(6) 
(h-1,2, . . . ,m)  ,  is  given  by 

(10.38)  Ph(0)  “  [l-exp{-Da(bh-bh_1)})[l+exp{-Da(6-bh^1))]~1 

tl+exp{Da(0-bli))]”1  , 

can  be  used  sb  models  for  the  multiple-choice  test  item,  in  such  a 
testing  situation  that  guessing  is  extremely  discouraged  by  our 
instructions.  "No  Answers"  will  be  treated  as  the  response  of  the 
lowest  rank  in  such  a  situation. 

(X . 7)  A  New  Family  of  Models  for  the  Multiple-Choice  Test  Items 

Suppose  that  our  multiple-choice  test  item  is  constructed  well 
enough  to  provide  us  with  (m-1)  distr actors,  which  have  certain  levels 


{ 


b0  -  - 

b  -  • 
m 


of  plausibility  to  attract  examinees.  Suppose,  further,  that  there 
is  some  simple  statistical  relationship  between  each  distractor  and 
ability  Q  ,  i.e.,  the  conditional  probability  with  which  the  examinee 
chooses  the  distractor  h  as  the  correct  answer  in  comparison  with  all 
the  other  (m-1)  alternatives,  given  6  ,  increases  in  6  up  to  a 
certain  level  of  0  ,  and  then  decreases  in  0  .  This  implies  that  there 
may  be  individuals  who  do  not  know  the  answer,  nor  recognize  the 
plausibility  of  any  distractor.  Suppose  that  the  conditional  probability 
with  which  the  examinee  belongs  to  this  category,  given  6  ,  is 
strictly  decreasing  in  ability  g  .  If  the  item  characteristic  function 
is  strictly  increasing  in  ability  0  with  eero  and  unity  as  its  two 
asymptotes,  and  if,  in  addition,  the  two  asymptotes  of  the  "plausibility" 
function  for  each  of  the  (m-1)  dlstractors  are  uniformly  zero,  and  those 
of  the  conditional  probability  for  the  "no  recognition"  category  are 
unity  and  zero,  respectively,  then  both  normal  ogive  model  on  the  graded 
response  level  and  the  multinomial  model  are  included  in  this  type  of 
models . 

This  type  of  models  is  suitable  only  if  the  supervision  is  strict 
and  the  examinees  extremely  discouraged  to  guess  when  they  do  not 
know  the  right  answer.  It  may  be  more  realistic  to  assume,  however, 
that  in  most  testing  situations  the  pressure  for  success  is  so  strong 
that  the  examinees  do  guess  when  they  have  no  idea  about  the  correct 
answer.  Suppose  that  these  examinees  guess  randomly,  and  select  one 
of  the  m  alternatives  with  equal  probability.  Thus  we  obtain  a  new 
family  of  models,  which  includes  modified  forms  of  such  models  as 
normal  ogive  and  logistic  models  on  the  graded  response  level  and  the 
multinomial  model. 

Let  P  (.6)  be  the  operating  characteristic  of  the  graded  response 
X8 

category  x^  C-0,1,2, . . . ,mg) ,  whose  mathematical  fora  is  given  as 

P,.(0»g)  in  (10.26),  (10.37)  of  (10.38),  or  of  anv  other  model  of 
11 

similar  characteristics.  For  convenience,  we  shall  call  these  models 
as  models  of  Type  I  on  the  graded  response  level.  To  be  specific,  models 
of  Type  I  are  those  which  satisfy  the  following. 


-219- 


2C-17 


(1)  (6)  is  strictly  decreasing  in  6  ,  with  unity  and  zero 

as  its*  two  asymptotes,  for  x  ■  0  . 

© 

(2)  P  (6)  is  unluodal  with  zero  as  its  two  asymptotes,  for 

v 1,8  2»-***(v1>  • 

(3)  Px  (0)  is  strictly  increasing  in  6  ,  with  zero  and  unity  as 

its  8  two  asymptotes,  for  x  ■  m  . 

g  g 

m 

g 

The  above  conditions  for  Type  I  models  also  imply  that  £  P8(°)  is 

s«x 

strictly  increasing  in  6  with  zero  and  unity  as  its  two  8  asymptotes, 

for  x  ■  1,2, . . .  ,m  . 
g  g 

We  use  this  additional  response  category  x  -  0  for  those  who 

g 

have  no  idea  at  all  as  to  which  alternative  the  correct  answer  is. 

Thus  the  probability  with  which  the  examinee  belongs  to  this  category 

is  strictly  decreasing  in  9  ,  with  unity  and  zero  as  its  two  asymptotes. 

We  assume  that  the  (m  -1)  distractors  of  the  multiple-choice  item  g 

g 

have  an  Implicit  order  among  themselves.  Thus  the  response  categories, 
x  *1,2, . • • ,(»  -1) ,  are  used  for  the  (m  -1)  distractors,  and  their 

6  O  £ 

operating  characteristics  of  the  distractors  are  unlmodal,  with  zero 

as  their  two  asymptotes,  respectively.  The  other  category,  x  •  m  , 

S  g 

is  for  the  correct  answer,  and  its  operating  characteristic  is  strictly 

increasing  in  9  with  zero  and  unity  as  its  two  asymptotes.  Since, 

in  reality,  the  examinees  who  belong  to  the  category,  x^“  0  ,  are 

assumed  to  guess  randomly,  however,  the  operating  characteristic  for 

this  response  category  disappears,  and  those  of  the  other  categories, 

or  the  m  alternatives,  are  affected  by  this  random  guessing.  The 

g 

operating  characteristic  of  the  alternative  h  can  be  written,  therefore, 
such  that 

(10.39)  P  C6;g)  «  Pv  (6;x  -h)  +  U/m)Pv  (0;x  -0)  . 

g  °  g 

Thus  we  have  obtained  a  new  family  of  models  for  the  multiple- 

choice  item.  When  Px  (e)  follows  the  normal  ogive  model  on  the 

g 


-220- 


X-18 


graded  response  level,  P.  (6;g)  ,  or  P.  (0)  ,  takes  on  a  form  such  that 


(10.40) 

where  a  >  0  ,  and 


-1/2  2  /, 

P.  (0)  -  C2ir)  1U[  h  e  u  '2du  +  (1/a) 

>  *<8-Vi) 


('  e-“2'2 

/aCO-b^) 


du]  , 


(10.41) 


<  b.  <  b_  <  . . .  <  b  <  b  . 
12  m  o+l 


For  simplicity,  we  shall  call  It  Model  A  of  Type  1  for  the  multiple- 

choice  Item.  When  P  (6)  In  (10.39)  Is  specified  by  the  logistic 

*g 

model  on  the  graded  response  level,  we  can  write 


(10.42)  Ph(0)  ■  [l-exp{-Da(bw-1-bh))]Il+axp{-Da(d-bh)>] 

r  .  .  r«.  .  V  l  *|— 1  r  r.  •  ..  r / 


-1 


[l+exp{Da(0-b^1)  }]  +  [m{l+exp[Da(6-b1j  ]  )] 


-1 


where  a  >  0  ,  and  the  Inequality  (10.41)  also  holds.  We  shall  call 
it  Model  B  of  Type  I  for  the  multiple-choice  item.  When  the  operating 
characteristic  of  the  category  xg  in  the  multinomial  model  is 

substituted  for  P  (0)  in  (10.39),  we  obtain 
*8 

a 


(10 


.43)  Ph(0)  "  +  (l/»)exp{ao0+co})[  Z  expU^e+c^ 


H-1  . 


where 


(10.44) 


a0  <  *1  <  *2  <  *•*  < 


We  shall  call  it  Model  C  of  Type  I  for  the  multiple-choice  items  or 
Bock-Saaej iaa  model  for  the  multiple-choice  item. 

For  the  purpose  of  illustration,  Figure  10-7-1  presents  the 
operating  characteristics  of  the  six  response  categories,  following 
the  normal  ogive  model  on  the  graded  response  level,  with  a^~  2.00  , 
bj-  -2.00  ,  b^"  -1.00  ,  bj“  0.00  ,  b^*  1.00  and  b^-  2.00  .  The 


b5  -  2.00  . 


modal  point  of  the  operating  characteristic  of  each  of  the  (m-1) 
intermediate  categories  is  given  by  (b^+  b^+1)/2  (Samejima,  1969). 
Figure  10-7-2  presents  the  corresponding  operating  characteristics 
of  the  five  alternatives  following  Model  A  of  Type  I  for  the  multiple- 
choice  item.  We  can  see  that  these  curves  are  no  longer  symmetric’ for 
h  ■  1,2,3  and  4  .  It  is  indicated  in  the  figure  that  the  asymptotes 
of  these  operating  characteristics  at  6  «  -»  are  uniformly  0.2  ,  or 
1/m  . 

Figure  10-7-3  and  10-7-4  present  the  operating  characteristics 
of  the  six  response  categories  in  the  logistic  model,  and  those  of 
the  five  alternatives  in  Model  B,  which  follow  the  mathematical  form 
given  as  (10.42),  for  a  hypothetical  multiple-choice  item.  In 
these  figures,  item  parameters  are.:  »  1.00  ,  b^  ■  -1.50  , 

■  -1.00  ,  b^  “  -0.50  ,  •  0.00  and  b,.  -0.50  . 

Figures  10-7-5  end  10  7-6  present  the  operating  ♦  list  ei  |sl  |< a 
of  the  five  response  categories  following  the  multinomial  model,  and 


n 


-222- 


X-20 


FIGURE  10-7-2 

Operating  Charactariotica  of  riva  Altarnatlvaa  Following  tbe  Modal  k  of  Type  I  foi 
tha  Multlpla-Cholca  lt«a>  The  Parameter*  Arc:  a^  «  2.00  ,  bj  ■  -2.00  ,  b2  -  -1.00  , 

bj“0.00,  b^  *  1 . 00  Mid  Uj»2.00. 

those  of  the  four  alternatives  in  Model  C,  which  are  given  by  (10.43), 
as  the  third  example.  The  item  parameters  for  this  pair  of  operating 
characteristics  are 


a^  -  -1.00, 


“  -0.50  , 


a3  -  0.00  , 


«  0.50 


a5  -  1.00  ,  c 
and  Cj  ■  0.75 


1.00 


-0.50 


0.00 


c4  -  -1.25 


(X.8)  Basic  Functions  and  Information  Functions  of  the  Maw  Models 
The  basic  function,  A  (6)  ,  which  is  defined  by  (3.14)  in 

Section  III. 5.,  has  an  essential  role  in  the  numerical  solution  of 
the  maximum  likelihood  estimation  of  the  examinee's  ability.  A 
sufficient  condition  that  a  model  defined  on  the  graded  response 
level  provides  us  with  a  unique  maximum  likelihood  estimate  for 
every  possible  response  pattern,  or  the  unique  maximum  condition,  is 
that  the  basic  function  is  strictly  decreasing  in  6  with  a 
non-negative  asymptote  at  e  -•  and  a  non-positive  asymptote 
as  6  -*■  «  ,  with  respect  to  every  item  response  category  (cf. 
Samejima,  1969,  1972). 


-n  -.vim 


1 .  ™  v-  k 


Li  jjfel 


223 


X-21 


FIGURE  10-7-3 

Operating  Characteristics  of  Si*  Item  Response  Categories  Following  the  logistic 
Model,  with  ag  -  1.00  ,  -  -1.50  ,  b?  -  -1.00  ,  b3  -  -0.50  ,  b4  -  0.00  and 

b5  -  0.50  . 


FIGURE  'i  0-7-4 

Operating  Characteristics  of  Five  Altai.  Ira*  Following  Modal  1  of  Type  I  for  the 
Multiple-Choice  Itea,  with  the  Paranetore,  a  ■  1.00  ,  bj  *  -1.50  ,  bj  -  -1.00  , 

bj  -  -0.50  ,  b^  ■  0.00  and  bj  -  0.50  . 


*1 

I 

i 

:J| 


•aw*  ^e«t 


PROBABILITY 


-ao  -2.0  -U0  0.0  10  2.0  30  4.0 

LATENT  TRAIT  9 


FIGURE  10-7-5 


Operating  Characteristic*  of  Five  I tea  Kaeponse  Categoriea  Following  th« 
Multinomial  Modal.  The  Itea  Pareaeter*  Are:  a2  -  -1.000  ,  e2  -  -0.500  . 

«3  -  0,000  ,  a^  «  0.500  .  a5  -  1.000  ;  ^  -  1.000  .  c2  -  -0.500  , 

-  0.000  ,  -  -1.250  ,  c$  -  0.750  . 


FIGURE  10-7-6 


Operating  Cherecterletlc#  of  Four  Alternatives 
Multiple-Choice  Ita®,  with  the  Faraaetere, 


0.500 


a5  -  1.000  ; 


Following  Model  C  of  Type  I  for  the 
a2  -  -1.000  ,  «2  -  -0.5000  , 

Cj  -  1.000  ,  c2  -  -0.500  , 


-225- 


X-23 


The  analogous  basic  function  can  be  defined  for  the 
multiple-choice  item.  We  have  for  the  basic  function,  A^(6)  » 
of  the  alternative  h 

(10.45)  Ve)  ■  M  Ph(»  -  -fa  P„(»)  [P^e))-1  . 

For  the  alternative  information  function,  1.(0),  ,  we  can  write 

«  ‘  ’  j 

(10.46)  V»  -  -  Ve>  -  Uh(e)]^^Ph(6)lPh(e)r1  : 

The  item  information  function  of  the  multiple-choice  item  is  the 
conditional  expectation  of  the  alternative  information  function, 
given  6  ,  such  that 

m  m 

(xo.47)  ig(e)  -  s  Ih(e)  ph(e)  -  s  [^(O)]2  Ph(e)  . 

It  should  be  noted  that  these  basic  functions  and  information 
functions  assume  more  complicated  forms  than  the  corresponding 
functions  for  the  graded  item  response  categories,  if  we  adopt  one 
of  the  models  for  the  multiple-choice  item,  l.e..  Models  A,  B  and 
C  .  We  shall  take  Model  B  of  Type  1  as  an  example,  and  observe  its 
basic  functions  and  information  functions,  which  are  given  by  (10. 45) 

(10.46)  and  (10.47).  Comparison  will  be  made  between  these  functions 
in  Model  B  and  those  in  the  logistic  model  on  the  graded  response 
level,  which  share  the  same  parameters. 

Let  P^(0)  he  such  that 
(10.48)  P*(0)  -  [1  +  exp{-Da(e-bh))]"1  . 

Then  we  can  rewrite  (10,42)  for  the  operating  characteristic  of  the 
alternative  h  in  the  form 


-226- 


X-24 


(10.49)  Ph(0)  -  tl-«xp{-Da(bh+1-bh))]  P£(0)  [l-Pfc^O)] 

+  (i/m)U-P£<e)]  . 

From  (10.49)  we  obtain  the  first  and  second  derivatives  of  P^C®) 
such  that 

(10.50)  Ph(8)  -  Da[{l-exp[-De(bh+rbh)]>Pg(e){l-P*fi(6)) 

{1-P*(e)-P*+1(6) }]  -  (l/m)DaP*(e)(l-P*(6)] 


(10.51) 


w  V6) 


D2a2 [ { 1-exp [-Da(bhf  1-bh) ]){[  1-P* ( 0) -P*,,  ( 0) ] 2 


-  P£(0)[1-P£(0)]  -  pjfl(e)[i-p^1(0)]}) 

-  (l/m)D2a2P* ( 0) [ 1-P* ( 6) ][ 1-2P* ( 6) ]  . 

It  Is  noted  that  the  last  term  in  each  of  (10.49),  (10.50)  and  (10.51) 
Is  the  term  which  makes  the  function  different  from  the  corresponding 
function  in  the  logistic  model  on  the  graded  response  level.  The 
amount  of  effect  caused  by  these  additional  terms  on  the  basic 
functions  and  the  information  functions  for  different  levels  of  0 
depends  upon  the  parameter  b^  ,  or  b^  for  h  *  1  .  If  these 
edditionel  terms  do  not  exist,  i.e.,  in  the  logistic  model  on  the 
graded  response  level,  we  can  write  for  the  basic  functions  and  the 
information  functions 

(10.52)  A^(0)  -  De[l-P*(0)-P(^1(0)]  , 

(10.53)  i^e)  -  D2a2[p*(e)U-p£(e)}  +  P£fl<0Hi-p^fl(e)}]  , 
where  h  ■  0,1,2, ...,m  ,  and 


ig(e)  -  D2d^[i-p(;(0)-p^fl(0)i2tp|»(e)-p^fl(e)) 


(10.54) 


Figure  10-8-1  presents  the  basic  functions  of  the  six  categories 
in  the  logistic  model,  which  are  given  by  (10. 52),  and  those  of  the 
five  alternatives  in  Model  B  of  Type  1  for  the  multiple-choice  item, 
which  were  obtained  by  subatit ’♦■■‘n'l  (10.49)  and  (10.50)  into  (10.45), 
for  the  hypothetical  test  item,  w.iose  operating  characteristics  ate 


FIGURE  10-8-1 

W»ic  Function*  of  Six  Itaa  hipcnac  Catagoriaa  in  tha  logiatic  Nodal  (Abova) ,  and 
Thoaa  of  Five  Altarnativea  in  Modal  B  (Belov).  Tha  Itea  Faraaatara  Are:  a  -  1.00  . 

( 

b  -  -1.50  ,  b  -  -1.00  ,  b  -  -0.50  ,  bA  -  0.00  and  bj  ■  0.50  . 


-228- 


X-26 


shown  In  Figures  10-7-3  and  10-7-4.  As  we  can  see  in  the  first  graph, 
ell  the  six  basic  functions  in  the  logistic  model  are  strictly 
decreasing  in  8  ,  with  the  common  asymptote.  1.7a  ,  at  6-+-  -»  for 
h  ■  2. 3,4, 5,6  and  -1.7a  at  6  “  for  h  ■  1,2, 3,4,5  ,  while  for 

h  -  6  the  asymptote  at  6  •+■  “  is  zero  and  for  h  -  1  the  one  at 
0  "*•  -®  is  zero,  respectively  (cf.  Saoejima,  1969).  It  should  also 
be  noted  that  for  the  four  intermediate  categories,  h  ■  2, 3, 4, 5  ,  the 
basic  functions  take  on  zero  at  6  -  Cbh+bll+1)  /2  . 

We  find  quite  a  contrasting  set  of  five  basic  functions  in  the 
second  graph  of  Figure  10-8-1.  In  fact,  none  of  these  basic  functions 
are  strictly  decreasing  in  6  ,  but  each  has  a  unique  modal  point,  and, 
except  for  h  ■  1  a  unique  local  minimum  also.  The  common  asymptote 
at  6  »  for  the  alternatives  excluding  the  correct  answer  is  -1.7a  , 

just  as  in  the  logistic  model,  and  the  other  common  asymptote  at 
0  -*■  -w  ,  along  with  the  asymptote  at  6  ->■  »  for  the  correct  answer, 
is  zero,  as  is  expected  from  (10.49)  and  (10.50).  It  is  very  obvious 
from  these  results  that  Model  B  does  not  satisfy  the  unique  maximum 
condition,  and,  therefore,  a  unique  maximum  likelihood  estimate  is 
not  assured  for  every  possible  response  pattern.  We  need  to  pursue 
the  character iestlcs  of  this  model  further  and  find  out  some 
practical  solution  for  this  problem,  therefore,  as  was  done  for  the 
three-parameter  logistic  model  (Samejima,  1973). 

We  notice  that  these  basic  functions  are  practically 
identical  with  the  corresponding  curves  in  the  logistic  model,  for 
certain  intervals  of  higher  ability.  Needless  to  say,  it  is  desirable 
if  these  intervals  start  from  relatively  lower  levels  of  ability  0  . 

It  is  obvious  that  the  lover  endpoint  of  such  an  interval  depends 
upon  the  parameter  b^  ,  which  is  indicated  by  an  arrow  in  the 
graph  of  Model  B  . 

Figure  10-8-2  presents  the  alternative  information  functions 
in  the  logistic  model,  and  the  corresponding  alternative  information 
functions  in  Model  B,  in  the  upper  and  lower  parts,  respectively,  of 
the  same  multiple  choice  item.  Among  each  of  the  two  sets  of  six  and  five 


-229 


X-27 


FIGURE  10-8-2 

lt*m  >*ct.«.«nsc  Information  Function*  (Various  Thinner  Curves)  and  the  Its*  Information 
Function  (Heavy  Dashas)  ln  the  Logistic  Model  (Above)  and  Those  In  Modal  B  of  Type  I 
for  the  Multiple-Choice  Itoa  (Below).  The  Itaa  Parameters  Are:  a^«  1.00  , 

-  -1.50  ,  b2  -  -1.00  ,  b3  -  -0.50  ,  b4  -  0.00  and  b5  -  0.50  . 

curves  for  the  alternative  information  functions*  we  find  the  Item 
Information  function*  which  is  drawn  by  a  thicker*  dashed  line.  As 
wsb  pointed  out  earlier,  in  the  logistic  model*  all  the  six 
alternative  information  functions  are  positive  for  the  entire  range 
of  6  ,  while  the  same  is  not  true  for  the  five  alternative  information 
functions  in  Model  B.  This  result  vs  expected  from  the  result  for 
the  basic  functions*  which  ware  obser-  earlier  in  this  flection. 

The  usefulness  of  the  item  Information  function  has  been 
emphasized  earlier  (Samejima,  1977),  especially  in  connection  with 
the  maximum  likelihood  estimation  of  the  examinee's  ability.  It 
should  be  noted*  however*  that  the  blind  use  of  the  item  information 
function,  or  the  test  information  function,  is  harmful*  whan  the  item 
response  information  functions*  or  the  alternative  information 
functions,  are  not  always  non-negative.  This  is  exemplified  in  the 
criticism  related  with  the  three-parameter  logistic  model  (Samejlaa* 


-230- 


X-28 


1973).  With  models  of  high  complexities,  like  Model  B,  cere  should 
be  taken  in  finding  out  the  limitation  in  using  the  item  information 
function. 

As  the  logical  consequence  of  the  observations  made  earlier 
for  the  basic  functions,  we  find  that  for  a  certain  Interval  of  6  , 
which  covers  eight  levels,  the  item  information  function  in  Model  B 
is  practically  identical  with  the  corresponding  item  information 
function  in  the  logistic  model.  We  can  see  in  Figure  10-8-2  that 
this  interval  is  approximately  (0.4,  »)  .  It  is  also  noted  that  for 
this  interval  oach  alternative  information  function  is  practically 
identical  with  the  counterpart  in  the  logistic  model,  the  fact  which 
indicates  that  the  affect  of  noises  caused  by  random  guessing  is 
negligibly  small  in  these  intervals,  and,  therefore,  we  can  expect 
that  the  accuracy  of  ability  estimation  is  just  as  high  as  the  one  in  the 
logistic  model  in  these  intervals  of  6  . 

(X.9)  Instructions  and  Mathematical  Models 

As  was  pointed  out  in  Section  X.5  ,  an  interesting  aspect  of 
the  role  of  mathematical  models  in  psychology  is  that  we  have  a 
control  over  stimuli  to  which  persons  react.  In  testing,  not  only 
can  we  develop  the  kinds  of  test  items  which,  with  an  appropriate 
theory,  enable  us  to  measure  the  examinee's  ability  accurately,  but 
also  we  can  direct  the  exaclnee  to  react  certain  ways,  by  giving  him 
suitable  instructions. 

If,  for  Instance,  the  examinee  is  encouraged  to  guess  randomly 
when  he  does  not  know  the  answer,  then  models  like  the  three-parameter 
logistic  model  will  be  appropriate.  It  is  not  wise  to  give  such 
Instructions,  however,  since,  in  so  doing,  we  fail  to  obtain  and  make 
use  of  the  information  given  by  the  dlstractors.  If  our  instructions 
extremely  discourage  the  examinee  to  guess  and  convince  him  that  it 
is  wise  to  leave  the  question  unanswered  rather  than  to  guass,  then 
models  like  Bock's  model  and  normal  ogive  model  on  the  graded  response 
level  will  be  appropriate.  If  our  instructions  discourage  the  examinee 
to  guess  but  encourage  him  to  anawer  aach  question  one  way  or  another, 


1 — ~  ~ j'1* '  , ..  f  -i* » 


“"T— 


then  models  like  Models  A,  B  and  C  will  be  appropriate. 

There  is  no  question  that  it  is  better  to  create  the  situation 
in  which  no  noise  is  involved,  and,  therefore,  we  can  use  models 
like  Bock's  model  or  normal  ogive  model  on  the  graded  response  level. 

It  may  not  be  easy  to  make  suitable  instructions  for  this  purpose, 
however,  without  more  or  less  deceiving  the  examinee.  Besides, 
the  pressure  for  success  is  so  strong  in  most  testing  situations 
that  the  examinee  may  turn  to  guessing  as  the  last  resort  regardless 
of  the  instructions.  For  this  reason.  Models  A,  B  and  C  may  be 
more  realistic. 

In  any  case,  before  selecting  a  specific  model  for  our  data, 
it  is  advisable  to  estimate  the  operating  characteristics  without 
assuming  any  mathematical  form,  using  our  methods  and  approaches 
which  were  introduced  in  Chapters  3,  5  and  6  ,  at  least,  for  some 
test  items.  In  so  doing,  if  we  find  a  substantially  large  number 
of  examinees  who  skipped  a  test  item,  we  shall  estimate  the  operating 
characteristic  for  the  "omission"  of  that  item.  If  the  estimated 
operating  characteristic  turned  out  to  be  informative,  by  providing 
us  with  a  strictly  decreasing  function  of  ability  or  a  unimodal 
function,  then  we  decide  that  the  category  of  "omission"  is  informative 
and  use  its  operating  characteristic  in  our  process  of  ability 
estimation.  If  it  turned  out  to  be  non- informative,  by  giving  us 
no  simple  statistical  relationship  with  ability  or  a  constant  function, 
then  we  decide  the  category  of  "omission"  is  useless  in  our  process 
of  ability  estimation,  and  just  ignore  it. 

(X.10)  A  New  Approach  to  Data  Analysis 

Tables  10-10-1  and  10-10-2  present  two  contingency  tables, 
each  of  which  is  for  the  four  alternatives,  A,  B,  C  and  D,  of  a 
multiple-choice  test  item  and  five  ability  groups  of  examinees. 

They  were  sampled  from  those  made  in  the  preliminary  research  conducted 
at  Educational  Testing  Service,  which  had  been  given  to  the  author 
by  the  courtesy  of  Mr.  Donald  Raske. 

We  notice  in  Table  10-10-1  that  for  this  multiple-choice  test 


-232- 


X-30 


item.  Item  43,  the  mode  of  the  frequency  for  the  alternative  A  is 
the  lowest  ability  group,  that  for  the  alternative  B  is  the  second 
highest  ability  group,  that  for  C  ,  the  correct  answer,  is  the 
highest  ability  group,  and  that  for  D  is  the  lowest  ability  group. 
Thus  these  four  alternatives  may  be  arranged  as  A,  D,  2  and  C  in 
ascending  order.  Actually  these  five  ability  groups  are  categorized 
with  respect  to  the  total  test  score  and  they  may  not  give  us  very 
accurate  categorization,  and  yet  the  table  is  informative  enough  to 
Indicate  the  existence  of  differential  information  given  by  the 
four  alternatives. 


FIGURE  10-10-1 

Coettagaaey  Tabla  Eatvaaa  tha  four  Altarnatlvaa  ami  tha  rivt 
Ability  Croups  tor  Xtw  *3  .  {JtTS  Data) 


Altaxnativa 

Vary 

tow 

Low 

Middla 

High 

Vary 

High 

Total 

A 

55 

40 

16 

26 

12 

159 

B 

64 

58 

68 

80 

52 

322 

C 

34 

64 

70 

74 

130 

372 

D 

34 

36 

29 

19 

6 

124 

Mo  Auswar 

1 

0 

0 

0 

0 

1 

Total 

IBB 

198 

193 

199 

200 

978 

FIGURE  10-10-2 

Caatingsncy  Tabls  ktww  tha  Four  Altoraatlvas  ami  tha  Mva 
Ability  croupe  for  Xtae  46  .  (RS  Bata) 


Altaxnativa 

Vary 

Low 

Low 

Middla 

High 

Vary 

High 

lotal 

A 

70 

104 

89 

69 

67 

399 

B 

31 

28 

54 

99 

121 

333 

C 

42 

31 

28 

15 

7 

123 

0 

43 

32 

20 

11 

4 

no 

Mo  Aaswar 

0 

2 

0 

0 

0 

» 

Total 

186 

197 

191 

194 

199 

967 

-233- 


X-31 


For  Item  46,  whose  contingency  table  is  given  as  Table  10-10-2, 
the  modes  are  the  second  lowest  ability  group  for  the  alternative  A, 
the  highest  ability  group  for  the  alternative  B,  the  correct  answer, 
and  the  lowest  ability  group  for  both  the  alternatives  C  and  D  . 

Unlike  Item  43,  this  test  item  does  not  have  very  explicit  order 
for  its  four  alternatives,  since  the  alternatives  C  and  D  have 
similar  frequency  distributions.  Thus  the  existence  of  differential 
Information  among  the  separate  alternatives  is  less  clear,  and  it  may 
be  advisable  to  examine  the  contents  of  the  alternatives  and  replace 
some  of  them  so  that  we  shall  obtain  a  contingency  table  similar  to 
the  one  for  Item  43  . 

This  type  of  contingency  table  is  useful  when  we  design  our 
research  project.  It  is  advisable  to  select  a  set  of  test  items 
which  have  confirmed  content  validity,  and  use  its  test  score  as 
the  substitute  for  ability.  In  our  preliminary  study,  we  can  make 
full  use  of  the  contingency  table  to  eventually  obtain  a  set  of 
alternatives  which  provide  us  with  differential  information,  so 
that  we  shall  be  able  to  adopt  a  model  which  belongs  to  the  Informative 
Dlstractor  Model  (cf.  Section  IX. 9). 

It  is  expected  in  our  contingency  table  that  the  correct 
answer  shows  strictly  increasing  frequencies.  It  is  desirable 
that  the  other  alternatives  have  differential  modes  among  the  four 
ability  groups,  "very  low"  through  "high".  It  should  be  noted  that 
we  need  one  alternative  for  each  test  item  which  attracts  examinees 
whose  ability  is  low,  in  order  to  avoid  the  effect  of  random  guessing 
for  a  wider  interval  of  ability.  This  means  that,  in  our  contingency 
table,  one  alternative  should  have  a  distinctly  high  frequency  for 
the  lowest  ability  group. 

As  was  discussed  in  Section  VIII. 6,  if  we  find  a  subset,  or 
subsets,  of  equivalent  test  items  in  the  core  set  of  test  items  of 
confirmed  content  validity,  then  we  shall  be  able  to  apply  Constant 
Information  Model,  and  use  the  subset,  or'  subsets,  as  Old  Test. 

In  such  a  caae,  the  test  items  do  not  have  to  be  equivalent  in  the 
sense  that  they  have  identical  sets  of  item  characteristic  functions 


-234- 


X-32 


plus  plausibility  functions  for  all  the  wrong  answers.  What  we 
need  is  the  identical  item  characteristic  functions,  since  the  test 
items  are  treated  as  binary  items  when  they  are  used  as  the  substitute 
for  the  Old  Test. 

If  we  do  not  find  such  a  subset  of  the  core  set  of  test  items, 
then  we  shall  assume  a  certain  model  for  the  correct  answer  of  each 
test  item,  and  estimate  its  parameters.  In  selecting  the  model,  the 
contingency  table  for  each  core  test  item  should  be  in  consideration. 
Models  like  normal  ogive  model  may  serve  for  the  purpose. 

Now  we  shall  estimate  the  maximum  likelihood  estimate  of  each 
examinee's  ability,  using  our  Old  Test.  Then  we  shall  estimate  the 
item  characteristic  function  of  each  core  test  item,  using  one  of 
the  combinations  of  a  method  and  an  approach  for  the  estimation, 
which  were  introduced  in  Chapters  3,  5  and  6  .  This  process  is  for 
the  check  of  internal  consistency,  and,  if  the  resultant  estimated 
item  characteristic  function  is  close  enough  to  the  assumed  one,  we 
shall  proceed.  The  plausibility  function  for  each  wrong  answer  of 
each  core  test  item  will  be  estimated,  using  our  Old  Test.  Then 
we  shall  estimate  the  operating  characteristic  of  each  alternative 
of  the  test  items,  which  are  not  included  by  the  core  set  of  test 
items,  using  the  same  method.  After  this  has  been  accomplished,  then 
we  shall  examine  the  resultant  set  of  operating  characteristics  for 
each  test  item,  and  select  an. appropriate  model.  Note  that  we  need 
not  to  choose  a  common  model  for  all  the  test  items.  If  we  select 
several  different  models  for  separate  subsets  of  our  test  items, 
we  shall  still  be  able  to  estimate  the  examinee's  ability  by  the 
maximum  likelihood  estimation,  provided  that  these  models  satisfy 
the  unique  maximum  condition  (cf.  Samejima,  1969,  1972). 

The  above  is  a  brief  description  of  the  new  approach  to  data 
analysis.  It  is  important  that  we  select  the  core  set  of  test  items 
which  have  confirmed  content  validity.  Psychometricians  should  not 
forget  psychological  reality,  and  the  operational  definition  of  ability 
dimension  is  by  far  the  most  Important.  In  so  doing,  statistical 
techniques  like  factor  analysis  may  be  helpful  In  determining  the 


-235- 


X-33 


dimensionality  of  ability.  After  this  operational  definition  of 
ability  has  been  accomplished  with  respect  to  the  core  set  of  test 
items,  the  operating  characteristics  of  all  the  other  test  items 
must  be  estimated  on  this  ability.  Note  that  this  new  approach 
incorporates  most  of  the  main  products  of  the  present  research, 
including  the  methods  and  approaches  for  estimating  the  operating 
characteristics  of  discrete  item  responses,  the  new  family  of  models 
for  the  multiple-choice  test  item,  Constant  Information  Model,  and 
so  forth. 


REFERENCES 


[1]  Blmbaum,  A.  Some  latent  trait  models  and  their  use  in 

inferring  an  examinee's  ability.  In  F.M.  Lord  and 
M.R.  Novick;  Statistical  theories  of  mental  test  scores. 
Addison-Wesley,  1986,  Chapters  17-20. 

[2]  Bock,  R.  D.  Estimating  item  parameters  and  latent  ability 

when  responses  are  scored  in  two  or  more  nominal 
categories.  Psychometrika,  1972,  37,  29-51. 

[3]  Lawley,  D.  N.  and  A.E.  Maxwell.  Factor  analysis  as  a 

statistical  method .  London:  Butterworth,  1971. 

[4]  Samejima.  F.  Application  of  the  graded  response  model  to  the 

nominal  response  and  multiple-choice  situations.  Chapel 
Hill,  N.C.:  University  of  North  Carolina  Psychometric 
Laboratory  Report .  63,  1968. 

[5]  Samejima,  F.  Estimation  of  latent  ability  using  a  response 

pattern  of  graded  scores.  Psycho metrika  Monograph, 

No.  17,  1969. 

[6]  Samejima,  F.  A  general  model  for  free-response  data. 

Psychometrika  Monograph,  No.  18,  1972. 

[7]  Samejima,  F.  A  comment  on  Birnbaum's  three-parameter  logistic 

model  in  the  latent  trait  theory.  Psychometrika,  1973, 
38,  221-233. 

[8]  Samejima,  F.  A  use  of  the  information  function  in  tailored 

testing.  Applied  Psychological  Measurement,  1977,  1, 
233-247. 

[9]  Shiba,  S.  Construction  of  a  scale  for  acquisition  of  word 

meanings.  Bulletin  of  Faculty  of  Education.  University 
of  Tokyo ,  17,  1968.  (in  Japanese) 


-236- 


XI- 1 


XI  Conclusions 

The  author  has  tried  to  integrate  main  topics  and  contents 
of  the  research  within  the  preceding  ten  chapters  of  this  final 
report.  The  work  was  difficult  because  of  the  abundance  of  products, 
and  the  author  vas  forced  to  drop  some  relatively  minor  topics  and 
contents.  Because  of  the  shortage  of  space,  only  a  few  figures  and 
tables  were  selected  tor  each  chapter.  To  give  an  example,  the 
estimated  item  characteristic  functions  are  illutrated  for  item  6 
only,  and  those  for  all  the  other  nine  binary  test  items  are  not 
shown  in  this  final  report.  In  spite  of  this  fact,  the  author  hopes 
that  the  reader  will  be  assisted  by  this  final  report  to  increase 
his  or  her  understanding  of  the  contents  of  the  research  and  its 
implications. 

Estimation  of  the  operating  characteristics  of  discrete  item 
responses,  as  well  as  that  of  ability  distributions,  without  assuming 
any  mathematical  forms  and  using  a  relatively  small  number  of 
examinees  turned  out  to  be  successful.  The  fact  that  Subtest  6  with 
only  eleven  test  items  of  three  item  score  categories  each  proved 
to  be  sufficient  to  serve  as  Old  Test  in  the  estimation  of  the 
operating  characteristics  indicates  the  robustness  of  the  present 
methods  and  approaches.  We  may  be  allowed  to  conclude  this  is  a 
remarkable  success.  This  finding  may  hold  not  only  with  unknown, 
binary  test  items,  but  also  with  unknown,  graded  test  items  and 
multiple-choice  test  items,  or  may  not;  the  conclusion  is  yet  to  come. 
It  16  the  author's  wish  that  other  researchers  use  the  methods  and 
approaches  for  different  data  to  find  out  how  they  work. 

The  new  family  of  models  for  the  multiple-choice  test  item 
has  been  proposed  and  shown  promise  for  the  usefulness  aa  models 
which  belong  to  the  Informative  Distractor  Model,  to  increase 
efficiency  in  ability  estimation,  and  make  the  multiple-choice  test 
item  much  more  than  a  blurred  image  of  the  free-response  test  item. 
This  family  of  models,  combined  with  the  methods  and  approaches  for 
estimating  the  operating  characteristics  of  discrete  item  responses. 


-237- 


XI- 2 


hu  given  e  new  direction  to  research  in  mental  measurement.  The  new 
approach  must  be  tasted  in  the  near  future  upon  empirical  data,  such 
as  Shiba'a.  It  is  the  author’s  hope  that  other  researchers  will  also 
develop  suitable  test  items  and  conduct  research  using  the  procedure 
proposed  in  this  final  report  to  find  out  how  it  works. 

The  method  of  moments  for  approximating  any  function  by  a 
polynomial,  which  proved  to  be  the  least  squares  solution,  has 
effectively  been  Incorporated.  Constant  Information  Model  has  been 
proposed,  and  it  has  found  its  place  in  the  new  direction  of  research 
in  mental  measurement.  Alternative  estimators  for  the  maximum 
likelihood  estimator,  which  are  population-free  unlike  Bayesian 
estimators,  have  been  proposed,  and  it  has  been  observed  how  they 
enhance  the  range  of  ability  for  which  a  specified  test  is  effective 
and  meaningful,  keeping  an  approximate  conditional  unbiasedness, 
given  ability. 

The  present  research  has  produced  many  new  theories  and 
methods,  and  so  forth.  It  has  also  proposed  new  problems  and  topics 
to  pursue  in  the  future.  In  this  sense  even  though  this  is  the  final 
report  of  the  present  research,  we  are  still  in  the  middle  of  the 
way  to  advance  latent  trait  theory  and  science. 


239- 


X 


i 

3 

* 
k 

A 

as* 


;g§ 

in 


fa  a 

$  m 

S214 

:  m&H 


■  sa 


u  k  a*  s 
aa**s 
«  asl 

»  ZJ  Jt 


3g» 

:gs 

aa  jj 
•  -8 
kg* 


5E 

tL 

st: 

§S" 

•  •  H 

M  «  r* 

ih 

■8  8* 

*  ta 

111 


» 

a 


a  I 

!j 


*18 

t*k 


li 


sae 


1 

a 


It  s' 


e 

B 

s 


l3  Kl 

Si  * i S I 


IS|m 

il»*8 
*isr. 
s  flit 

*7a! 

&  a 


sa 


3  * 


b  a 
i 
5 


t  A  & 


at  a 

SS  c 

l*il 

sac 


i11  j 

ikk  43 

k&Gtt 


■'’SK 

•sa  a 

1**5 
•  Hi  . 
&k|: 

m 

* « 

k: 


;te 


sx  $ 

c  m 

k«|8 

*5r*s 

s*s!i 

aeSSt 

3*na]j 

In 

*=* 


sa  ss 

£  U  >  M 

v  ®  •«  W 

H  l.  ,  N 
2  *  v  t  , 

*  *  S  IS 
Bl„l  * 
as: 

O  M  L 


«  F 

u  5  m 

Itim 

nuK 

ca  £*■ 

i.  ^  B  ■ 

S  t  *1 


y 


feSHi  1  SS-g 


y 


«  *> 

•  «  g 

^SS 
SatiS 
as  s  w™ 

.1  *  *s 
- 


’.8 1 
3  |f»« 


lls!i 

.  9  *0  O  V 


*  Is 
a  ass 

41  *J  >  M 

*  S*« 
°  *  fc« 

.  M  ,C  I  ► 
-  “•‘t  S  &  . 

*  1***1 

C  M 

&sl£a 


I  j 

a  c  “ 
•ssa 

•  *•  -rj 
?  ►  6 

i-  «a, 

‘  Ik  *1  ^  ' 


Slain  4 

Ijbhn 

gs  g  I! 


as 

Bg 

1 5 


£  s 

U  8 

a: 

.a 

la 

v4  L 
O 


‘a  ss 


lag  i*r*s 

liii  j!iS| 

rfil  si 


s 

« 

*s 

is» 

a  Stss 
1  a|s 

*  =  si  a 
££ *«S 
£  %*  *» 


kh 


'’vmapqppviij."  '■ 11 


-242- 


*> 

& 


m 

x&i 


a 

1 


to* 

*J§S 

irsis 


*iisi 

iSfi.d 


s 

e 


8£  S 

lfs« 

ll  fei 

fid  1st 

•&cu  -mi 

&8i2  use 


fill 

SSfs 

akfjjg 


s 

i 


|2 

s  srta 
asss 
E  © 

Srt  s 

f*.  ► « 

■<  *•«  *4  49 

*8SS 


£ 

?*o 

3?s 

K  VI 

-m  £  2 

v  m 

t'S  . 

tee 

*83 

f-  +>  V 

k.  n  v 

*33 

*at 


3 

'S 

*| 

» * 

1 1 

t«i 

3V-  V.  . 

o  «  «* 

*>  ** 

S  W  v4  M  H 

*{pi 

alilS 


i 

»R 

28 

?*3 

he 

*  0  o 

Ml 

aa* 


if 

rH  c 

£3 

\ih 

ml 

*>  c  *» 

*»  •  r}  * 

bs 

aaaa 


i 

x » 

6  S 

wi 

R*S- 

in* 

tSS  . 

L.  0 

*fi  » 

.Rad 

*3£6 


II 

H  « 

£  S 

s! 


Si* 

&! 


uh 

mt'ZX 

d  Sis 

231? 


% 

a 

sh 

|a£S 

«5»S 

asst? 

*  It  a  S 

5  a«S4 


,  t 
85 
3S 

*3 

ii  i 


13*3 


I 

I 


a* 

e? 

cfiS 

1125 

Mis 


,£8a 

tRdfi! 

4£rtS 

m* 

8|2S 

aU| 


;l 

*4 


s? 

sJ\ 

£2 


s 


t*J5 


?  . 

•) 

tj  .  8~ 
S8»38 
rases 

Its 


If 

if® 

*S* 


alai 


V  *«#  *  l-  . 

plffl 
I  lilill 


